Real World Kafka Use Cases and Architecture Patterns
Apache Kafka powers some of the largest data systems on the planet. LinkedIn built Kafka to handle its own activity feed. Uber uses it to track every ride in real time. Netflix relies on it to stream pipeline events. Walmart uses it to track inventory across thousands of stores simultaneously.
These companies did not adopt Kafka for its own sake. They adopted it because specific problems — moving high volumes of data reliably between many systems in real time — had no better solution. This guide explains those problems and the architecture patterns Kafka enables to solve them.
The Core Problem Kafka Solves
Imagine a company with ten systems that all need to share data: an e-commerce platform, a payments service, a fraud detection engine, an analytics platform, a recommendation engine, a search index, a notification system, a CRM, a data warehouse, and a mobile app backend.
WITHOUT Kafka (Point-to-Point Integration):
────────────────────────────────────────────
Every system connects directly to every other system that needs its data.
E-commerce ──▶ Payments
E-commerce ──▶ Fraud Detection
E-commerce ──▶ Analytics
E-commerce ──▶ Search Index
Payments ──▶ Fraud Detection
Payments ──▶ Analytics
Payments ──▶ CRM
...
10 systems = up to 90 direct connections
Each connection = custom integration code + separate monitoring
One system change = update all connected systems
Result: Brittle, expensive, impossible to extend
WITH Kafka (Hub-and-Spoke):
──────────────────────────────
E-commerce ──▶ Kafka ──▶ Payments
──▶ Fraud Detection
──▶ Analytics
──▶ Search Index
──▶ Recommendation Engine
──▶ Data Warehouse
10 systems = 10 connections to Kafka (one each)
New system added = one new Kafka consumer, zero changes to producers
One system change = only that system changes
Result: Loosely coupled, scalable, easy to extend
Architecture Pattern 1: Event Streaming Pipeline
This is the most fundamental Kafka pattern. Systems emit events as they happen, and downstream systems consume those events to do their own work independently.
Real-World Example: E-Commerce Order Lifecycle
Order Placed ──▶ Kafka Topic: orders.created
│
┌───────────┼────────────┬────────────────┐
▼ ▼ ▼ ▼
[Payment Service] [Inventory] [Notifications] [Analytics]
Charges card Reserves Sends email Records sale
────────────── stock to customer metrics
Emits event: ──────────
payments.processed
│
▼
Kafka Topic: payments.processed
│
┌───────┴────────┐
▼ ▼
[Fulfillment] [Fraud Detection]
Creates Scores transaction
shipment for fraud risk
Full timeline for one order:
00:00:000 Customer clicks "Buy"
00:00:012 Order event arrives in Kafka
00:00:018 Payment service starts charging card
00:00:019 Inventory service reserves stock
00:00:020 Notification service queues email
00:00:245 Payment confirmed, event emitted
00:00:251 Fulfillment service creates shipment
00:00:252 Fraud engine scores the transaction
Each service works independently. The payment service does not wait for the inventory service to finish. The fraud engine does not block the fulfillment service. Every service reads from Kafka at its own pace and publishes its own results back to Kafka. Adding a new service — say, a loyalty points service — requires zero changes to existing services.
Architecture Pattern 2: Change Data Capture (CDC)
Many businesses store their data in relational databases. Kafka's CDC pattern streams every change made to a database table — inserts, updates, deletes — into Kafka topics in real time. Downstream systems consume these changes and update their own stores.
Traditional Problem:
──────────────────────────────
Database → Nightly export → Data warehouse
Result: Data warehouse is always 24 hours out of date
CDC with Kafka (using Debezium connector):
──────────────────────────────────────────
PostgreSQL database
│ (Debezium reads transaction log)
▼
Kafka Topic: dbserver1.public.orders
│
├──▶ Elasticsearch (search index updated in seconds)
├──▶ Redis cache (cache invalidated immediately)
├──▶ Data warehouse (receives changes in near real-time)
└──▶ Analytics (dashboards reflect current state)
What a CDC event looks like:
{
"op": "u", ← operation: u=update, c=create, d=delete
"before": {
"order_id": 12345,
"status": "pending"
},
"after": {
"order_id": 12345,
"status": "shipped" ← what changed
},
"ts_ms": 1718012400000 ← when it changed
}
This pattern eliminates dual writes (writing to both the database and a message queue separately) because Debezium reads directly from the database's replication log. If the database recorded the change, Kafka will receive it — guaranteed.
Architecture Pattern 3: CQRS with Kafka
CQRS stands for Command Query Responsibility Segregation. It separates the model that writes data from the model that reads data. Kafka acts as the communication backbone between the write side and multiple read-optimized stores.
┌──────────────────────────────────────────────────────────┐ │ CQRS Architecture │ │ │ │ Commands (writes) Queries (reads) │ │ ───────────────── ─────────────── │ │ Create Order What are my orders? │ │ Update Status Show orders by date │ │ Cancel Order Filter by status │ │ │ │ ┌───────────┐ ┌────────────────────────┐ │ │ │ Command │ │ Read Models │ │ │ │ Handler │──▶ Kafka ──▶│ │ │ │ │ │ │ PostgreSQL (history) │ │ │ └───────────┘ │ Elasticsearch (search) │ │ │ │ │ Redis (recent orders) │ │ │ ▼ └────────────────────────┘ │ │ Write Model DB │ │ (optimized for writes) │ └──────────────────────────────────────────────────────────┘ Why this works: Write model: normalized, ACID transactions, optimized for writes Read model: denormalized, cached, optimized for specific query patterns Kafka: the single source of truth connecting both sides
Architecture Pattern 4: Real-Time Stream Processing
Kafka stores data, but it does not transform data on its own. Kafka Streams and Apache Flink sit on top of Kafka to perform real-time computations — aggregations, joins, enrichments, and alerts — on data as it flows through.
Real-World Example: Fraud Detection
Raw transaction events flow into Kafka: transactions topic → [Kafka Streams app] → fraud-alerts topic Kafka Streams does this computation in real time: ───────────────────────────────────────────────────── For each transaction, check: 1. Has this card made >5 transactions in the last 60 seconds? 2. Is the transaction amount 3× the user's 30-day average? 3. Is the merchant country different from the user's home country? 4. Has the same card appeared in two distant locations within 10 minutes? If 2+ rules trigger → emit fraud alert to fraud-alerts topic All of this happens in milliseconds, not minutes. Processing Timeline: Transaction occurs → t=0ms Arrives in Kafka → t=12ms Kafka Streams reads → t=15ms Rules evaluated → t=17ms Fraud alert emitted → t=18ms Payment blocked → t=25ms Compare to batch approach: Transaction occurs → t=0 Overnight batch runs → t=12 hours Fraud detected → t=12 hours Damage already done ← too late
Windowed Aggregations
Tumbling Window — fixed, non-overlapping time slots:
─────────────────────────────────────────────────────
[00:00–00:01] Count orders = 120
[00:01–00:02] Count orders = 145
[00:02–00:03] Count orders = 98
Each window independent — no overlap
Sliding Window — moves forward continuously:
─────────────────────────────────────────────────────
At 00:00:30 → look at last 60 seconds: [23:59:30–00:00:30] = 130 orders
At 00:00:31 → look at last 60 seconds: [23:59:31–00:00:31] = 133 orders
At 00:00:32 → look at last 60 seconds: [23:59:32–00:00:32] = 131 orders
Window slides every second — ideal for rate limiting and fraud detection
Session Window — groups events by user activity gaps:
─────────────────────────────────────────────────────
User 123: click → click → click → (30 min gap) → click → click
├──── Session 1 ────┤ ├─ Session 2 ─┤
Useful for: user behavior analysis, session-level aggregations
Architecture Pattern 5: Lambda Architecture
Lambda Architecture handles both real-time data and historical data using two separate processing paths that merge at the serving layer.
┌─────────────────────────────────────────────────────────────┐ │ LAMBDA ARCHITECTURE │ │ │ │ All new data │ │ │ │ │ ▼ │ │ KAFKA TOPICS │ │ │ │ │ ┌────┴────┐ │ │ │ │ │ │ ▼ ▼ │ │ Speed Batch │ │ Layer Layer │ │ │ │ │ │ │ Apache │ Apache Spark │ │ │ Flink │ (processes │ │ │ (real- │ all historical │ │ │ time) │ data nightly) │ │ │ │ │ │ ▼ ▼ │ │ Real-time Batch │ │ Views + Views ──▶ Serving Layer ──▶ User queries │ │ (merges both) │ └─────────────────────────────────────────────────────────────┘ Trade-off: ✅ Accurate historical results (batch layer) ✅ Low-latency recent results (speed layer) ❌ Maintains two codebases (same logic written twice) ❌ Complex to synchronize and debug Modern alternative: Kappa Architecture Use only Kafka + stream processing for everything Replay historical data through the same stream processor One codebase, simpler operations
Use Case: Ride-Sharing Platform (Uber-Style)
Real-time driver and rider tracking at scale:
Events flowing through Kafka every second:
driver-location → 500,000 events/sec (GPS pings)
ride-requests → 50,000 events/sec
ride-status-updates → 200,000 events/sec
surge-pricing → 10,000 events/sec
Architecture:
──────────────────────────────────────────────────────
[Driver App]──GPS ping──▶ Kafka: driver.location
│
┌──────────┘
│
[Flink Job: Geo-Aggregator]
│ Groups drivers by H3 hexagon grid
│ Counts drivers per hexagon every 10 sec
▼
Kafka: driver.density
│
┌─────┴──────┐
▼ ▼
[Matching [Surge Pricing
Service] Engine]
Finds nearest Calculates multiplier
driver based on supply/demand
│ │
▼ ▼
Kafka: matched.rides Kafka: surge.updates
│
[Rider App]
Shows driver ETA
and price
Use Case: Financial Trading Platform
Market data distribution at microsecond precision: Requirements: • Distribute price feeds to 10,000 traders simultaneously • Process order executions with guaranteed ordering • Detect market anomalies in real time • Audit every trade permanently Kafka Setup: ────────────────────────────────────────────────────── Topic: market.prices Partitions: 200 (one per trading symbol group) Replication: 3 Retention: 30 days (regulatory requirement) Compression: LZ4 Topic: orders.submitted Partitions: 500 (high parallelism for order intake) Key: account_id (orders from same account go to same partition → ordered) Replication: 3 min.insync.replicas: 2 Topic: trades.executed Partitions: 100 Retention: 7 years (legal compliance) Cleanup policy: delete (retain all records forever practically) Order Consistency Guarantee: Account 12345 places 3 orders: Order A (buy 100 shares) ──▶ partition 45 (offset 1001) Order B (sell 50 shares) ──▶ partition 45 (offset 1002) Order C (buy 200 shares) ──▶ partition 45 (offset 1003) Same account → same partition → guaranteed sequence Order B never processed before Order A
Use Case: IoT Data Ingestion
Smart factory with 50,000 sensors:
Each sensor sends readings every 100 milliseconds:
50,000 sensors × 10 readings/sec = 500,000 messages/sec
Kafka handles this with partition sharding:
──────────────────────────────────────────────────────
Topic: sensor.readings
Partitions: 100
Key: sensor_id (consistent hashing → each sensor always same partition)
Retention: 48 hours (raw data)
Compression: Snappy (good for numeric data)
Downstream Processing:
sensors 1–500 → partition 1 → [Anomaly Detector]
sensors 501–1000 → partition 2 → [Anomaly Detector]
... │
sensors 49,500–50,000 → partition 100 ▼
[Alert System]
[SCADA integration]
[Maintenance scheduling]
[Time-series database]
Alert Example:
Temperature sensor #34,122 sends 87°C reading
Normal range: 20–75°C
Flink job detects anomaly in 50ms
Alert dispatched to maintenance team
Machine shutdown command sent in 200ms total
Prevented equipment damage
Use Case: Microservices Event-Driven Architecture
Replacing synchronous REST calls with async events:
Before (synchronous, brittle):
──────────────────────────────
User Service ──REST──▶ Notification Service
(What if notification service is down?)
(User service blocks waiting for response)
(One slow service slows everything)
After (asynchronous, resilient):
──────────────────────────────────────────────────────
User Service ──▶ Kafka: user.registered
│
┌────────┼────────┬────────────────┐
▼ ▼ ▼ ▼
[Welcome [Profile [Onboarding [Analytics
Email] Setup] Flow] Dashboard]
User Service sends event and moves on immediately
Each downstream service processes in its own time
If Notification Service is down → event waits in Kafka
When it comes back up → processes all queued events
Zero data loss, zero user impact
Saga Pattern for Distributed Transactions:
──────────────────────────────────────────
Create Order ──▶ Reserve Inventory ──▶ Charge Payment ──▶ Ship Order
Each step emits a "success" or "failure" event to Kafka
If payment fails:
Payment Service emits: payment.failed
Inventory Service consumes and releases reserved stock
Order Service consumes and cancels order
Notification Service sends "order failed" email
No distributed transactions needed
Compensating actions handle rollbacks through events
Architecture Pattern: Exactly-Once Processing
Three delivery guarantees explained: At-Most-Once: Producer sends message → Broker confirms or not → Producer moves on Risk: Message lost if broker fails after receiving but before confirming Use when: Logging, metrics (losing one reading is acceptable) At-Least-Once: Producer sends message → Broker confirms On timeout → Producer retries Risk: Message arrives twice if broker confirmed but confirmation got lost Use when: Most event processing (deduplication handles duplicates) Exactly-Once: Producer sends message with sequence number Broker deduplicates based on sequence number Consumer processes and commits in atomic transaction Result: Each message processed exactly one time, guaranteed Kafka Configuration for Exactly-Once: ───────────────────────────────────── Producer: enable.idempotence=true, transactional.id=unique-per-producer Consumer: isolation.level=read_committed Code: producer.initTransactions() producer.beginTransaction() producer.send(record1) producer.send(record2) producer.sendOffsetsToTransaction(offsets, consumerGroupId) producer.commitTransaction() ← all or nothing
Choosing the Right Kafka Architecture
Decision Tree for Kafka Architecture Selection:
──────────────────────────────────────────────────────────────────
Need real-time data?
│
YES ──▶ Need transformation/computation?
│ │
│ YES ──▶ Need joins across streams?
│ │ │
│ │ YES ──▶ Apache Flink (complex stream joins)
│ │ │
│ │ NO ──▶ Kafka Streams (simple aggregations)
│ │
│ NO ──▶ Pure Kafka (store and forward only)
│
NO ──▶ Need historical replay?
│
YES ──▶ Kafka with long retention + batch processing
│
NO ──▶ Traditional message queue (RabbitMQ, SQS)
Throughput guide:
< 1,000 msg/sec → Any system works
1,000–100,000 → Kafka starts showing advantages
100,000–1M+ → Kafka is the natural choice
1M+ → Kafka with careful partitioning and hardware
Key Points — Kafka Use Cases and Architecture Summary
- Kafka's hub-and-spoke model replaces point-to-point integrations and reduces connection complexity from O(n²) to O(n).
- CDC with Debezium captures every database change and streams it to Kafka without modifying the source application.
- The event streaming pipeline pattern decouples producers and consumers so each service evolves independently.
- Stream processing with Kafka Streams or Apache Flink enables millisecond-latency fraud detection, anomaly alerts, and real-time aggregations.
- Windowed aggregations — tumbling, sliding, and session windows — power rate-based computations like surge pricing and velocity checks.
- The Saga pattern uses Kafka events to coordinate distributed transactions across microservices without requiring a distributed transaction manager.
- Exactly-once semantics require idempotent producers, transactional producers, and read_committed consumers working together.
- IoT and financial trading use cases succeed because Kafka scales to millions of events per second while preserving per-key ordering guarantees.
