Kubernetes Resource Requests and Limits CPU and Memory Management
Every container running in Kubernetes competes for the same pool of CPU and memory on the nodes. Without controls, a single misbehaving container can consume all available resources and starve every other application on the node. Resource requests and limits give each container its fair share and protect neighbors from each other.
The Shared Restaurant Analogy
Think of a node as a restaurant kitchen with limited stove burners. Each container is a chef who needs burners to cook. A request is the chef's reservation — "I need at least 2 burners." A limit is the maximum they can ever use — "You can use at most 4 burners, even if more are free." Kubernetes uses requests to schedule Pods and limits to enforce boundaries at runtime.
Node: 8 CPUs, 16 GB RAM Pod A: request 1 CPU, limit 2 CPU Pod B: request 2 CPU, limit 4 CPU Pod C: request 1 CPU, limit 2 CPU Scheduler checks: 1+2+1 = 4 CPUs requested ≤ 8 CPUs available → OK Runtime: each Pod can burst above its request but never above its limit
CPU Units
Kubernetes measures CPU in millicores. 1 CPU = 1000 millicores (m). 500m = half a CPU. You can also write it as a decimal: 0.5 = 500m.
| Value | Meaning |
|---|---|
100m | 10% of one CPU core |
500m | Half a CPU core |
1 or 1000m | One full CPU core |
2 | Two full CPU cores |
Memory Units
Memory uses binary units. Mi (mebibytes) and Gi (gibibytes) are the standard in Kubernetes:
| Value | Approximate Size |
|---|---|
128Mi | 128 mebibytes (~134 MB) |
256Mi | 256 mebibytes (~268 MB) |
1Gi | 1 gibibyte (~1.07 GB) |
4Gi | 4 gibibytes (~4.3 GB) |
Setting Requests and Limits in a Pod Spec
spec:
containers:
- name: web-app
image: my-app:v1
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
This container is guaranteed 200m CPU and 256Mi memory. It can burst up to 500m CPU and 512Mi memory when resources are available, but never beyond those limits.
What Happens When a Container Exceeds Its Limits
CPU Limit Exceeded
If a container tries to use more CPU than its limit, Kubernetes throttles it — slows it down by pausing its processes temporarily. The container stays running but performs more slowly. CPU throttling is not visible in logs; you detect it with metrics tools like Prometheus.
Memory Limit Exceeded
Memory is handled differently. If a container uses more memory than its limit, Kubernetes kills it with an OOMKilled (Out Of Memory Killed) error and restarts it according to its restart policy. You see this in Pod events and logs:
kubectl describe pod my-pod # Events show: OOMKilled kubectl get pod my-pod # RESTARTS column increases every time it is OOM killed
QoS Classes: How Kubernetes Prioritizes Pods Under Pressure
Kubernetes assigns each Pod a Quality of Service (QoS) class based on how its resources are defined. When a node runs out of memory, Kubernetes evicts Pods starting from the lowest QoS class.
| QoS Class | Condition | Eviction Priority |
|---|---|---|
| BestEffort | No requests or limits set | Evicted first |
| Burstable | Requests set but limits differ, or only one set | Evicted second |
| Guaranteed | Requests = limits for CPU and memory | Evicted last |
For Guaranteed QoS:
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m" # Same as request
memory: "512Mi" # Same as request
Set requests equal to limits for critical production workloads like databases. This ensures they are evicted last and never throttled by neighbors.
Viewing Current Resource Usage
kubectl top pods # CPU and memory usage per pod kubectl top pods -n production # Specific namespace kubectl top nodes # CPU and memory per node kubectl describe node my-node # Full node resource allocation
The describe node output shows two important sections: Capacity (total hardware resources) and Allocatable (resources available for Pods after system components take their share).
Finding the Right Values for Your App
Do not guess resource values. Run your application under realistic load and measure actual usage with kubectl top pods or Prometheus. Set requests slightly above average usage and limits at your acceptable maximum. Start conservatively and tune based on real data.
Measured average: 150m CPU, 200Mi memory Recommended request: 200m CPU, 256Mi memory (20-30% buffer) Recommended limit: 500m CPU, 512Mi memory (2-3x the request)
LimitRange: Namespace-Level Defaults
If developers forget to set resource values, their containers run with BestEffort QoS and risk eviction. A LimitRange sets defaults for the namespace:
apiVersion: v1
kind: LimitRange
metadata:
name: default-resources
namespace: production
spec:
limits:
- type: Container
defaultRequest:
cpu: "100m"
memory: "128Mi"
default:
cpu: "300m"
memory: "256Mi"
max:
cpu: "2"
memory: "4Gi"
Containers without resource settings get the defaults. Containers that request more than the max are rejected.
Key Points
- Requests tell the Scheduler how much resource a container needs — they affect Pod placement.
- Limits enforce the maximum a container can consume — exceeding CPU limits throttles the container; exceeding memory limits kills it.
- QoS class (BestEffort, Burstable, Guaranteed) determines eviction order under memory pressure.
- Set requests equal to limits for critical workloads to achieve Guaranteed QoS.
- Use LimitRange to apply namespace-level resource defaults so containers without explicit values get reasonable settings automatically.
