Kubernetes Node Affinity and Taints Controlling Pod Placement

The Kubernetes Scheduler automatically picks a node for each Pod based on available resources. Sometimes you need more control — you want GPU-intensive Pods on nodes that have GPUs, or you want to keep production workloads off shared testing nodes. Node Affinity and Taints give you precise control over where Pods land.

Three Mechanisms for Placement Control

MechanismDirectionEffect
nodeSelectorPod → Node (simple)Pod requires a specific node label
Node AffinityPod → Node (advanced)Pod prefers or requires certain node labels
Taints and TolerationsNode → PodNode repels Pods unless they tolerate the taint

nodeSelector: The Simple Approach

nodeSelector is the easiest way to pin Pods to specific nodes. Label a node, then tell your Pod to only run on nodes with that label.

# Label a node
kubectl label node gpu-node-1 hardware=gpu

# In the Pod spec
spec:
  nodeSelector:
    hardware: gpu
  containers:
  - name: ml-job
    image: tensorflow:latest

The Pod only schedules on nodes with the label hardware=gpu. If no such node exists or all matching nodes are full, the Pod stays Pending.

Node Affinity: More Expressive Rules

Node Affinity builds on nodeSelector with operators like In, NotIn, Exists, Gt (greater than), and two modes — required and preferred.

Required Affinity (Hard Rule)

The Pod must schedule on a matching node. If no matching node exists, the Pod waits in Pending status.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - us-east-1a
            - us-east-1b

This Pod only runs in zones us-east-1a or us-east-1b.

Preferred Affinity (Soft Rule)

Kubernetes tries to place the Pod on a matching node but schedules it elsewhere if no match is available. You assign a weight (1–100) to indicate preference strength.

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        preference:
          matchExpressions:
          - key: disk-type
            operator: In
            values:
            - ssd
      - weight: 20
        preference:
          matchExpressions:
          - key: disk-type
            operator: In
            values:
            - hdd

The Scheduler strongly prefers SSD nodes (weight 80) but accepts HDD nodes (weight 20) as a fallback.

Taints and Tolerations: Node Repels Pods

A taint is applied to a node. It says: "No regular Pods may run here." A toleration is applied to a Pod. It says: "I accept this node's taint — schedule me here anyway."

Think of a taint like a "No Unauthorized Personnel" sign on a server room door. A toleration is the access badge that lets authorized staff (specific Pods) enter.

# Apply a taint to a node
kubectl taint node gpu-node-1 dedicated=gpu-only:NoSchedule

# Effect: NoSchedule means no new Pods without a matching toleration will schedule here

Taint Effects

EffectBehavior
NoScheduleNew Pods without a toleration cannot schedule on this node
PreferNoScheduleScheduler avoids this node for Pods without toleration but will use it if necessary
NoExecuteExisting Pods without toleration are evicted; new ones cannot schedule

Adding a Toleration to a Pod

spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu-only"
    effect: "NoSchedule"
  containers:
  - name: ml-workload
    image: my-ml-app

This Pod tolerates the dedicated=gpu-only:NoSchedule taint. It can schedule on the gpu-node-1 even though other Pods cannot.

Combining Taints and Node Affinity

Use taints to repel unwanted Pods from specialized nodes, and use Node Affinity to attract the right Pods to those nodes. Together, they ensure that only the intended workloads run on dedicated hardware.

GPU Node Setup:
  Taint: dedicated=gpu:NoSchedule    ← Repels regular Pods
  Label: hardware=gpu                ← Identifies the node

ML Job Pod Setup:
  Toleration: dedicated=gpu:NoSchedule  ← Can schedule on GPU node
  NodeAffinity: hardware=gpu (required) ← Must schedule on GPU node

Pod Affinity and Anti-Affinity

Pod Affinity and Anti-Affinity control placement relative to other Pods — not nodes.

Pod Affinity

Schedule this Pod on the same node (or zone) as Pods with a certain label. Use this when low latency between two services matters:

affinity:
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: cache
      topologyKey: kubernetes.io/hostname

This Pod must run on the same node as Pods labeled app=cache.

Pod Anti-Affinity

Spread Pods across different nodes for high availability. If all replicas of your app land on the same node and that node fails, everything goes down. Anti-affinity prevents that:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: web-frontend
      topologyKey: kubernetes.io/hostname

No two Pods labeled app=web-frontend run on the same node. Each replica is on a different node, so any single node failure takes down at most one replica.

Key Points

  • nodeSelector is the simplest placement control — a Pod requires specific node labels.
  • Node Affinity offers required (hard) and preferred (soft) rules with flexible operators.
  • Taints repel Pods from nodes. Tolerations let specific Pods override that repulsion.
  • Use taints + node affinity together for dedicated workloads like GPU jobs or database nodes.
  • Pod Anti-Affinity spreads replicas across nodes to avoid single-node failure taking down all instances.

Leave a Comment