GCP GKE
Google Kubernetes Engine (GKE) is GCP's managed Kubernetes service. Kubernetes is an open-source system that automates the deployment, scaling, and management of containerized applications across a cluster of machines. GKE handles the setup and management of the Kubernetes control plane, so the focus stays on deploying applications rather than managing cluster infrastructure.
Imagine running a fleet of delivery trucks. Kubernetes is the logistics system — it decides which truck carries which cargo, monitors truck health, replaces broken trucks automatically, and scales the fleet up during busy seasons. GKE provides this system, already configured and ready to use.
Core Kubernetes Concepts
Cluster
A cluster is a group of machines (nodes) that run containerized applications managed by Kubernetes. A GKE cluster has two parts:
GKE Cluster
├── Control Plane (managed by Google)
│ ├── API Server ← Accepts commands (kubectl, Console)
│ ├── Scheduler ← Decides which node runs which pod
│ ├── Controller Manager ← Monitors and maintains desired state
│ └── etcd ← Stores all cluster state and config
│
└── Worker Nodes (your VMs)
├── Node 1 (e2-standard-4)
├── Node 2 (e2-standard-4)
└── Node 3 (e2-standard-4)
Pod
A pod is the smallest deployable unit in Kubernetes. Each pod runs one or more containers and shares the same network and storage. Think of a pod as a single instance of an application.
Deployment
A deployment manages a set of identical pods. It ensures the desired number of pods is always running. If a pod crashes, the deployment replaces it automatically.
Service
A Kubernetes Service exposes pods to network traffic. Since pod IP addresses change when pods restart, a Service provides a stable endpoint (IP address or DNS name) that always points to the correct pods.
Deployment: my-app
┌──────────────────────────────────────────────┐
│ Pod 1 (10.0.0.5) ──┐ │
│ Pod 2 (10.0.0.6) ──┤──▶ Service: my-app │
│ Pod 3 (10.0.0.7) ──┘ (Stable IP/DNS) │
└──────────────────────────────────────────────┘
│
▼
External Traffic
Creating a GKE Cluster
# Create a standard cluster with 3 nodes gcloud container clusters create my-cluster \ --zone us-central1-a \ --num-nodes 3 \ --machine-type e2-standard-4 # Configure kubectl to connect to the cluster gcloud container clusters get-credentials my-cluster \ --zone us-central1-a
GKE also offers an Autopilot mode where Google manages node provisioning, scaling, and maintenance automatically — paying only per pod, not per node.
# Create an Autopilot cluster (fully managed by Google) gcloud container clusters create-auto my-autopilot-cluster \ --region us-central1
Deploying an Application to GKE
Step 1 – Write the Kubernetes Deployment YAML
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-app
spec:
replicas: 3
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
spec:
containers:
- name: web
image: gcr.io/my-project/my-web-app:v1
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Step 2 – Write the Kubernetes Service YAML
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-web-app-service
spec:
type: LoadBalancer
selector:
app: my-web-app
ports:
- port: 80
targetPort: 8080
Step 3 – Apply to the Cluster
# Apply the deployment kubectl apply -f deployment.yaml # Apply the service kubectl apply -f service.yaml # Check pods are running kubectl get pods # Get the external IP address of the service kubectl get service my-web-app-service
Kubernetes Scaling
Manual Scaling
# Scale a deployment to 5 replicas kubectl scale deployment my-web-app --replicas=5
Horizontal Pod Autoscaler (HPA)
HPA automatically adjusts the number of pods based on CPU or memory usage.
# Scale between 2 and 10 pods based on CPU usage
kubectl autoscale deployment my-web-app \
--min=2 --max=10 --cpu-percent=70
# Or define in YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Rolling Updates and Rollbacks
Kubernetes updates applications without downtime using rolling updates. It replaces pods one at a time, ensuring some pods always handle traffic during the update.
Before Update: Pod v1 ✓ Pod v1 ✓ Pod v1 ✓ During Update (rolling): Pod v2 ✓ Pod v1 ✓ Pod v1 ✓ ← v2 serves traffic while others update Pod v2 ✓ Pod v2 ✓ Pod v1 ✓ Pod v2 ✓ Pod v2 ✓ Pod v2 ✓ After Update: Pod v2 ✓ Pod v2 ✓ Pod v2 ✓
# Update the container image kubectl set image deployment/my-web-app web=gcr.io/my-project/my-web-app:v2 # Roll back to the previous version if issues arise kubectl rollout undo deployment/my-web-app
GKE Standard vs Autopilot
| Feature | GKE Standard | GKE Autopilot |
|---|---|---|
| Node management | Manual — specify node count and type | Fully managed by Google |
| Billing | Per node (VM) | Per pod resource usage |
| Flexibility | Full control over node config | Limited — Google sets node specs |
| Best For | Teams needing full Kubernetes control | Teams wanting simplicity and lower ops burden |
Key Takeaways
- GKE is Google's managed Kubernetes service for running containers at scale.
- A cluster has a Control Plane (managed by Google) and Worker Nodes (VMs).
- Pods are the smallest deployable unit; Deployments manage groups of pods.
- Services provide stable endpoints for accessing pods.
- Horizontal Pod Autoscaler scales pods automatically based on CPU/memory.
- Rolling updates deploy new versions without downtime.
- Autopilot mode removes all node management responsibility.
