Kubernetes Resource Requests and Limits CPU and Memory Management

Every container running in Kubernetes competes for the same pool of CPU and memory on the nodes. Without controls, a single misbehaving container can consume all available resources and starve every other application on the node. Resource requests and limits give each container its fair share and protect neighbors from each other.

The Shared Restaurant Analogy

Think of a node as a restaurant kitchen with limited stove burners. Each container is a chef who needs burners to cook. A request is the chef's reservation — "I need at least 2 burners." A limit is the maximum they can ever use — "You can use at most 4 burners, even if more are free." Kubernetes uses requests to schedule Pods and limits to enforce boundaries at runtime.

Node: 8 CPUs, 16 GB RAM

Pod A: request 1 CPU, limit 2 CPU
Pod B: request 2 CPU, limit 4 CPU
Pod C: request 1 CPU, limit 2 CPU

Scheduler checks: 1+2+1 = 4 CPUs requested ≤ 8 CPUs available → OK
Runtime: each Pod can burst above its request but never above its limit

CPU Units

Kubernetes measures CPU in millicores. 1 CPU = 1000 millicores (m). 500m = half a CPU. You can also write it as a decimal: 0.5 = 500m.

ValueMeaning
100m10% of one CPU core
500mHalf a CPU core
1 or 1000mOne full CPU core
2Two full CPU cores

Memory Units

Memory uses binary units. Mi (mebibytes) and Gi (gibibytes) are the standard in Kubernetes:

ValueApproximate Size
128Mi128 mebibytes (~134 MB)
256Mi256 mebibytes (~268 MB)
1Gi1 gibibyte (~1.07 GB)
4Gi4 gibibytes (~4.3 GB)

Setting Requests and Limits in a Pod Spec

spec:
  containers:
  - name: web-app
    image: my-app:v1
    resources:
      requests:
        cpu: "200m"
        memory: "256Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"

This container is guaranteed 200m CPU and 256Mi memory. It can burst up to 500m CPU and 512Mi memory when resources are available, but never beyond those limits.

What Happens When a Container Exceeds Its Limits

CPU Limit Exceeded

If a container tries to use more CPU than its limit, Kubernetes throttles it — slows it down by pausing its processes temporarily. The container stays running but performs more slowly. CPU throttling is not visible in logs; you detect it with metrics tools like Prometheus.

Memory Limit Exceeded

Memory is handled differently. If a container uses more memory than its limit, Kubernetes kills it with an OOMKilled (Out Of Memory Killed) error and restarts it according to its restart policy. You see this in Pod events and logs:

kubectl describe pod my-pod
# Events show: OOMKilled
kubectl get pod my-pod
# RESTARTS column increases every time it is OOM killed

QoS Classes: How Kubernetes Prioritizes Pods Under Pressure

Kubernetes assigns each Pod a Quality of Service (QoS) class based on how its resources are defined. When a node runs out of memory, Kubernetes evicts Pods starting from the lowest QoS class.

QoS ClassConditionEviction Priority
BestEffortNo requests or limits setEvicted first
BurstableRequests set but limits differ, or only one setEvicted second
GuaranteedRequests = limits for CPU and memoryEvicted last
For Guaranteed QoS:
resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"      # Same as request
    memory: "512Mi"  # Same as request

Set requests equal to limits for critical production workloads like databases. This ensures they are evicted last and never throttled by neighbors.

Viewing Current Resource Usage

kubectl top pods                        # CPU and memory usage per pod
kubectl top pods -n production          # Specific namespace
kubectl top nodes                       # CPU and memory per node
kubectl describe node my-node           # Full node resource allocation

The describe node output shows two important sections: Capacity (total hardware resources) and Allocatable (resources available for Pods after system components take their share).

Finding the Right Values for Your App

Do not guess resource values. Run your application under realistic load and measure actual usage with kubectl top pods or Prometheus. Set requests slightly above average usage and limits at your acceptable maximum. Start conservatively and tune based on real data.

Measured average: 150m CPU, 200Mi memory
Recommended request: 200m CPU, 256Mi memory  (20-30% buffer)
Recommended limit:   500m CPU, 512Mi memory  (2-3x the request)

LimitRange: Namespace-Level Defaults

If developers forget to set resource values, their containers run with BestEffort QoS and risk eviction. A LimitRange sets defaults for the namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-resources
  namespace: production
spec:
  limits:
  - type: Container
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    default:
      cpu: "300m"
      memory: "256Mi"
    max:
      cpu: "2"
      memory: "4Gi"

Containers without resource settings get the defaults. Containers that request more than the max are rejected.

Key Points

  • Requests tell the Scheduler how much resource a container needs — they affect Pod placement.
  • Limits enforce the maximum a container can consume — exceeding CPU limits throttles the container; exceeding memory limits kills it.
  • QoS class (BestEffort, Burstable, Guaranteed) determines eviction order under memory pressure.
  • Set requests equal to limits for critical workloads to achieve Guaranteed QoS.
  • Use LimitRange to apply namespace-level resource defaults so containers without explicit values get reasonable settings automatically.

Leave a Comment