Real World Kafka Use Cases and Architecture Patterns

Apache Kafka powers some of the largest data systems on the planet. LinkedIn built Kafka to handle its own activity feed. Uber uses it to track every ride in real time. Netflix relies on it to stream pipeline events. Walmart uses it to track inventory across thousands of stores simultaneously.

These companies did not adopt Kafka for its own sake. They adopted it because specific problems — moving high volumes of data reliably between many systems in real time — had no better solution. This guide explains those problems and the architecture patterns Kafka enables to solve them.

The Core Problem Kafka Solves

Imagine a company with ten systems that all need to share data: an e-commerce platform, a payments service, a fraud detection engine, an analytics platform, a recommendation engine, a search index, a notification system, a CRM, a data warehouse, and a mobile app backend.

WITHOUT Kafka (Point-to-Point Integration):
────────────────────────────────────────────
Every system connects directly to every other system that needs its data.

  E-commerce ──▶ Payments
  E-commerce ──▶ Fraud Detection
  E-commerce ──▶ Analytics
  E-commerce ──▶ Search Index
  Payments   ──▶ Fraud Detection
  Payments   ──▶ Analytics
  Payments   ──▶ CRM
  ...

10 systems = up to 90 direct connections
Each connection = custom integration code + separate monitoring
One system change = update all connected systems

Result: Brittle, expensive, impossible to extend

WITH Kafka (Hub-and-Spoke):
──────────────────────────────
  E-commerce ──▶ Kafka ──▶ Payments
                       ──▶ Fraud Detection
                       ──▶ Analytics
                       ──▶ Search Index
                       ──▶ Recommendation Engine
                       ──▶ Data Warehouse

10 systems = 10 connections to Kafka (one each)
New system added = one new Kafka consumer, zero changes to producers
One system change = only that system changes

Result: Loosely coupled, scalable, easy to extend

Architecture Pattern 1: Event Streaming Pipeline

This is the most fundamental Kafka pattern. Systems emit events as they happen, and downstream systems consume those events to do their own work independently.

Real-World Example: E-Commerce Order Lifecycle

Order Placed ──▶ Kafka Topic: orders.created
                      │
          ┌───────────┼────────────┬────────────────┐
          ▼           ▼            ▼                ▼
  [Payment Service] [Inventory] [Notifications] [Analytics]
  Charges card      Reserves    Sends email     Records sale
  ──────────────    stock       to customer     metrics
  Emits event:      ──────────                  
  payments.processed             
          │                      
          ▼                      
  Kafka Topic: payments.processed
          │
  ┌───────┴────────┐
  ▼                ▼
[Fulfillment]  [Fraud Detection]
Creates        Scores transaction
shipment       for fraud risk

Full timeline for one order:
00:00:000  Customer clicks "Buy"
00:00:012  Order event arrives in Kafka
00:00:018  Payment service starts charging card
00:00:019  Inventory service reserves stock
00:00:020  Notification service queues email
00:00:245  Payment confirmed, event emitted
00:00:251  Fulfillment service creates shipment
00:00:252  Fraud engine scores the transaction

Each service works independently. The payment service does not wait for the inventory service to finish. The fraud engine does not block the fulfillment service. Every service reads from Kafka at its own pace and publishes its own results back to Kafka. Adding a new service — say, a loyalty points service — requires zero changes to existing services.

Architecture Pattern 2: Change Data Capture (CDC)

Many businesses store their data in relational databases. Kafka's CDC pattern streams every change made to a database table — inserts, updates, deletes — into Kafka topics in real time. Downstream systems consume these changes and update their own stores.

Traditional Problem:
──────────────────────────────
Database → Nightly export → Data warehouse
Result: Data warehouse is always 24 hours out of date

CDC with Kafka (using Debezium connector):
──────────────────────────────────────────
PostgreSQL database
  │  (Debezium reads transaction log)
  ▼
Kafka Topic: dbserver1.public.orders
  │
  ├──▶ Elasticsearch  (search index updated in seconds)
  ├──▶ Redis cache     (cache invalidated immediately)
  ├──▶ Data warehouse  (receives changes in near real-time)
  └──▶ Analytics       (dashboards reflect current state)

What a CDC event looks like:
{
  "op": "u",              ← operation: u=update, c=create, d=delete
  "before": {
    "order_id": 12345,
    "status": "pending"
  },
  "after": {
    "order_id": 12345,
    "status": "shipped"   ← what changed
  },
  "ts_ms": 1718012400000  ← when it changed
}

This pattern eliminates dual writes (writing to both the database and a message queue separately) because Debezium reads directly from the database's replication log. If the database recorded the change, Kafka will receive it — guaranteed.

Architecture Pattern 3: CQRS with Kafka

CQRS stands for Command Query Responsibility Segregation. It separates the model that writes data from the model that reads data. Kafka acts as the communication backbone between the write side and multiple read-optimized stores.

┌──────────────────────────────────────────────────────────┐
│                    CQRS Architecture                     │
│                                                          │
│  Commands (writes)          Queries (reads)              │
│  ─────────────────          ───────────────              │
│  Create Order               What are my orders?          │
│  Update Status              Show orders by date          │
│  Cancel Order               Filter by status             │
│                                                          │
│  ┌───────────┐              ┌────────────────────────┐   │
│  │ Command   │              │   Read Models          │   │
│  │ Handler   │──▶ Kafka  ──▶│                        │   │
│  │           │              │ PostgreSQL (history)   │   │
│  └───────────┘              │ Elasticsearch (search) │   │
│       │                     │ Redis (recent orders)  │   │
│       ▼                     └────────────────────────┘   │
│  Write Model DB                                          │
│  (optimized for writes)                                  │
└──────────────────────────────────────────────────────────┘

Why this works:
  Write model: normalized, ACID transactions, optimized for writes
  Read model: denormalized, cached, optimized for specific query patterns
  Kafka: the single source of truth connecting both sides

Architecture Pattern 4: Real-Time Stream Processing

Kafka stores data, but it does not transform data on its own. Kafka Streams and Apache Flink sit on top of Kafka to perform real-time computations — aggregations, joins, enrichments, and alerts — on data as it flows through.

Real-World Example: Fraud Detection

Raw transaction events flow into Kafka:
  transactions topic → [Kafka Streams app] → fraud-alerts topic

Kafka Streams does this computation in real time:
─────────────────────────────────────────────────────
For each transaction, check:
  1. Has this card made >5 transactions in the last 60 seconds?
  2. Is the transaction amount 3× the user's 30-day average?
  3. Is the merchant country different from the user's home country?
  4. Has the same card appeared in two distant locations within 10 minutes?

If 2+ rules trigger → emit fraud alert to fraud-alerts topic

All of this happens in milliseconds, not minutes.

Processing Timeline:
  Transaction occurs    → t=0ms
  Arrives in Kafka      → t=12ms
  Kafka Streams reads   → t=15ms
  Rules evaluated       → t=17ms
  Fraud alert emitted   → t=18ms
  Payment blocked       → t=25ms

Compare to batch approach:
  Transaction occurs    → t=0
  Overnight batch runs  → t=12 hours
  Fraud detected        → t=12 hours
  Damage already done   ← too late

Windowed Aggregations

Tumbling Window — fixed, non-overlapping time slots:
─────────────────────────────────────────────────────
  [00:00–00:01] Count orders = 120
  [00:01–00:02] Count orders = 145
  [00:02–00:03] Count orders = 98
  Each window independent — no overlap

Sliding Window — moves forward continuously:
─────────────────────────────────────────────────────
  At 00:00:30 → look at last 60 seconds: [23:59:30–00:00:30] = 130 orders
  At 00:00:31 → look at last 60 seconds: [23:59:31–00:00:31] = 133 orders
  At 00:00:32 → look at last 60 seconds: [23:59:32–00:00:32] = 131 orders
  Window slides every second — ideal for rate limiting and fraud detection

Session Window — groups events by user activity gaps:
─────────────────────────────────────────────────────
  User 123: click → click → click → (30 min gap) → click → click
                    ├──── Session 1 ────┤            ├─ Session 2 ─┤
  
  Useful for: user behavior analysis, session-level aggregations

Architecture Pattern 5: Lambda Architecture

Lambda Architecture handles both real-time data and historical data using two separate processing paths that merge at the serving layer.

┌─────────────────────────────────────────────────────────────┐
│                    LAMBDA ARCHITECTURE                      │
│                                                             │
│  All new data                                               │
│       │                                                     │
│       ▼                                                     │
│   KAFKA TOPICS                                              │
│       │                                                     │
│  ┌────┴────┐                                                │
│  │         │                                                │
│  ▼         ▼                                                │
│ Speed    Batch                                              │
│ Layer    Layer                                              │
│  │         │                                                │
│  │  Apache │ Apache Spark                                   │
│  │  Flink  │ (processes                                     │
│  │  (real- │ all historical                                 │
│  │   time) │ data nightly)                                  │
│  │         │                                                │
│  ▼         ▼                                                │
│ Real-time  Batch                                            │
│ Views   +  Views    ──▶  Serving Layer  ──▶  User queries   │
│                          (merges both)                      │
└─────────────────────────────────────────────────────────────┘

Trade-off:
  ✅ Accurate historical results (batch layer)
  ✅ Low-latency recent results (speed layer)
  ❌ Maintains two codebases (same logic written twice)
  ❌ Complex to synchronize and debug

Modern alternative: Kappa Architecture
  Use only Kafka + stream processing for everything
  Replay historical data through the same stream processor
  One codebase, simpler operations

Use Case: Ride-Sharing Platform (Uber-Style)

Real-time driver and rider tracking at scale:

Events flowing through Kafka every second:
  driver-location     → 500,000 events/sec (GPS pings)
  ride-requests       → 50,000 events/sec
  ride-status-updates → 200,000 events/sec
  surge-pricing       → 10,000 events/sec

Architecture:
──────────────────────────────────────────────────────
[Driver App]──GPS ping──▶ Kafka: driver.location
                                     │
                          ┌──────────┘
                          │
                    [Flink Job: Geo-Aggregator]
                          │ Groups drivers by H3 hexagon grid
                          │ Counts drivers per hexagon every 10 sec
                          ▼
                    Kafka: driver.density
                          │
                    ┌─────┴──────┐
                    ▼            ▼
             [Matching      [Surge Pricing
              Service]       Engine]
             Finds nearest  Calculates multiplier
             driver         based on supply/demand
                    │            │
                    ▼            ▼
              Kafka: matched.rides  Kafka: surge.updates
                    │
             [Rider App]
             Shows driver ETA
             and price

Use Case: Financial Trading Platform

Market data distribution at microsecond precision:

Requirements:
  • Distribute price feeds to 10,000 traders simultaneously
  • Process order executions with guaranteed ordering
  • Detect market anomalies in real time
  • Audit every trade permanently

Kafka Setup:
──────────────────────────────────────────────────────
Topic: market.prices
  Partitions: 200 (one per trading symbol group)
  Replication: 3
  Retention: 30 days (regulatory requirement)
  Compression: LZ4

Topic: orders.submitted
  Partitions: 500 (high parallelism for order intake)
  Key: account_id (orders from same account go to same partition → ordered)
  Replication: 3
  min.insync.replicas: 2

Topic: trades.executed
  Partitions: 100
  Retention: 7 years (legal compliance)
  Cleanup policy: delete (retain all records forever practically)

Order Consistency Guarantee:
  Account 12345 places 3 orders:
  Order A (buy 100 shares) ──▶ partition 45 (offset 1001)
  Order B (sell 50 shares) ──▶ partition 45 (offset 1002)
  Order C (buy 200 shares) ──▶ partition 45 (offset 1003)

  Same account → same partition → guaranteed sequence
  Order B never processed before Order A

Use Case: IoT Data Ingestion

Smart factory with 50,000 sensors:

Each sensor sends readings every 100 milliseconds:
  50,000 sensors × 10 readings/sec = 500,000 messages/sec

Kafka handles this with partition sharding:
──────────────────────────────────────────────────────
Topic: sensor.readings
  Partitions: 100
  Key: sensor_id (consistent hashing → each sensor always same partition)
  Retention: 48 hours (raw data)
  Compression: Snappy (good for numeric data)

Downstream Processing:
  sensors 1–500     → partition 1  → [Anomaly Detector]
  sensors 501–1000  → partition 2  → [Anomaly Detector]
  ...                                   │
  sensors 49,500–50,000 → partition 100   ▼
                                    [Alert System]
                                    [SCADA integration]
                                    [Maintenance scheduling]
                                    [Time-series database]

Alert Example:
  Temperature sensor #34,122 sends 87°C reading
  Normal range: 20–75°C
  Flink job detects anomaly in 50ms
  Alert dispatched to maintenance team
  Machine shutdown command sent in 200ms total
  Prevented equipment damage

Use Case: Microservices Event-Driven Architecture

Replacing synchronous REST calls with async events:

Before (synchronous, brittle):
──────────────────────────────
  User Service ──REST──▶ Notification Service
                          (What if notification service is down?)
                          (User service blocks waiting for response)
                          (One slow service slows everything)

After (asynchronous, resilient):
──────────────────────────────────────────────────────
  User Service ──▶ Kafka: user.registered
                      │
             ┌────────┼────────┬────────────────┐
             ▼        ▼        ▼                ▼
  [Welcome   [Profile [Onboarding [Analytics
   Email]     Setup]   Flow]       Dashboard]
  
  User Service sends event and moves on immediately
  Each downstream service processes in its own time
  If Notification Service is down → event waits in Kafka
  When it comes back up → processes all queued events
  Zero data loss, zero user impact

Saga Pattern for Distributed Transactions:
──────────────────────────────────────────
  Create Order ──▶ Reserve Inventory ──▶ Charge Payment ──▶ Ship Order
  
  Each step emits a "success" or "failure" event to Kafka
  
  If payment fails:
    Payment Service emits: payment.failed
    Inventory Service consumes and releases reserved stock
    Order Service consumes and cancels order
    Notification Service sends "order failed" email
  
  No distributed transactions needed
  Compensating actions handle rollbacks through events

Architecture Pattern: Exactly-Once Processing

Three delivery guarantees explained:

At-Most-Once:
  Producer sends message → Broker confirms or not → Producer moves on
  Risk: Message lost if broker fails after receiving but before confirming
  Use when: Logging, metrics (losing one reading is acceptable)

At-Least-Once:
  Producer sends message → Broker confirms
  On timeout → Producer retries
  Risk: Message arrives twice if broker confirmed but confirmation got lost
  Use when: Most event processing (deduplication handles duplicates)

Exactly-Once:
  Producer sends message with sequence number
  Broker deduplicates based on sequence number
  Consumer processes and commits in atomic transaction
  Result: Each message processed exactly one time, guaranteed
  
  Kafka Configuration for Exactly-Once:
  ─────────────────────────────────────
  Producer: enable.idempotence=true, transactional.id=unique-per-producer
  Consumer: isolation.level=read_committed
  
  Code:
  producer.initTransactions()
  producer.beginTransaction()
  producer.send(record1)
  producer.send(record2)
  producer.sendOffsetsToTransaction(offsets, consumerGroupId)
  producer.commitTransaction()   ← all or nothing

Choosing the Right Kafka Architecture

Decision Tree for Kafka Architecture Selection:
──────────────────────────────────────────────────────────────────

Need real-time data?
  │
  YES ──▶ Need transformation/computation?
  │              │
  │             YES ──▶ Need joins across streams?
  │              │              │
  │              │             YES ──▶ Apache Flink (complex stream joins)
  │              │              │
  │              │              NO ──▶ Kafka Streams (simple aggregations)
  │              │
  │              NO ──▶ Pure Kafka (store and forward only)
  │
  NO ──▶ Need historical replay?
               │
              YES ──▶ Kafka with long retention + batch processing
               │
               NO ──▶ Traditional message queue (RabbitMQ, SQS)

Throughput guide:
  < 1,000 msg/sec  → Any system works
  1,000–100,000    → Kafka starts showing advantages
  100,000–1M+      → Kafka is the natural choice
  1M+              → Kafka with careful partitioning and hardware

Key Points — Kafka Use Cases and Architecture Summary

Kafka's hub-and-spoke model replaces point-to-point integrations and reduces connection complexity from O(n²) to O(n).
CDC with Debezium captures every database change and streams it to Kafka without modifying the source application.
The event streaming pipeline pattern decouples producers and consumers so each service evolves independently.
Stream processing with Kafka Streams or Apache Flink enables millisecond-latency fraud detection, anomaly alerts, and real-time aggregations.
Windowed aggregations — tumbling, sliding, and session windows — power rate-based computations like surge pricing and velocity checks.
The Saga pattern uses Kafka events to coordinate distributed transactions across microservices without requiring a distributed transaction manager.
Exactly-once semantics require idempotent producers, transactional producers, and read_committed consumers working together.
IoT and financial trading use cases succeed because Kafka scales to millions of events per second while preserving per-key ordering guarantees.

Previous lesson

Back to course