Message Keys Partitioning and Ordering in Apache Kafka

Message keys are one of the most strategically important concepts in Kafka. They control where messages land within a topic, which directly determines what order consumers see events in and whether related events are always processed together. Misunderstanding keys leads to subtle bugs — events arriving out of order, hot partitions crushing throughput, or consumers processing the same entity's events in the wrong sequence.

What Is a Message Key

Every Kafka message can have an optional key. The key is a byte array — just like the value. It can be a string, a number, a JSON object, or any data you choose to serialize. The key serves two purposes: it determines which partition the message goes to, and it groups related messages together in that partition.

When no key is provided (null key), Kafka distributes messages across partitions using a round-robin strategy — partition 0, then partition 1, then partition 2, and so on. This maximizes throughput and load balancing but sacrifices ordering between messages.

How Keys Determine Partition Assignment

When a producer sends a message with a key, Kafka applies a hash function to the key and uses the result to pick a partition. The default partitioner in Kafka uses the MurmurHash2 algorithm:

partition = murmur2(key) % num_partitions

The same key always produces the same hash value, which always maps to the same partition. This is the critical guarantee that enables ordering.

EXAMPLE: 3-partition topic "orders"

Message key "user_123" → hash("user_123") % 3 = 1 → Partition 1
Message key "user_456" → hash("user_456") % 3 = 0 → Partition 0
Message key "user_789" → hash("user_789") % 3 = 2 → Partition 2
Message key "user_123" → hash("user_123") % 3 = 1 → Partition 1 (always!)

Result: ALL messages for user_123 go to Partition 1, forever.
Partition 1 sees all user_123 events in the exact order they arrived.

Why Keys Are Essential for Ordering

Kafka guarantees ordering only within a partition. If you need all events for a specific entity (a user, an order, a device) to be processed in the exact order they happened, those events must land in the same partition. Using the entity's identifier as the message key achieves exactly this.

The Bank Transaction Ordering Problem

Imagine a bank account with events: deposit $1000, withdraw $200, withdraw $500, withdraw $400. The correct balance after all four events is $1000 - $200 - $500 - $400 = -$100 (overdraft). But if events arrive out of order, the calculation changes — and some orderings might incorrectly approve a $400 withdrawal that should have been rejected.

WITHOUT KEYS (round-robin, no ordering guarantee):

Partition 0: [deposit $1000] [withdraw $400]
Partition 1: [withdraw $200] 
Partition 2: [withdraw $500]

Consumer reads from all 3 partitions concurrently.
Events might arrive in this order:
  Partition 0: deposit $1000  → balance: $1000
  Partition 2: withdraw $500  → balance: $500
  Partition 0: withdraw $400  → balance: $100  ← APPROVED (correct)
  Partition 1: withdraw $200  → balance: -$100 ← APPROVED (overdraft!)

OR in this order:
  Partition 1: withdraw $200  → balance: -$200 (wrong starting balance)
  ...

No guarantee. Results depend on processing timing.

WITH KEY = account_id (all events for same account → same partition):

Partition 1: [deposit $1000] [withdraw $200] [withdraw $500] [withdraw $400]
             
Consumer processes Partition 1 in order:
  deposit $1000  → balance: $1000
  withdraw $200  → balance: $800
  withdraw $500  → balance: $300
  withdraw $400  → balance: -$100 ← correct, overdraft detected

Choosing the Right Key

The key you choose has major consequences for partition balance, ordering, and performance. A good key satisfies two properties simultaneously: it keeps related events together, and it distributes events reasonably evenly across partitions.

Good Key Choices

User ID: All events for a specific user go to the same partition. Enables per-user ordering and stateful processing. Works well when users are roughly uniformly active.

Order ID: All events in an order's lifecycle (created, paid, shipped, delivered) go to the same partition. Essential for order state machines.

Device ID: All sensor readings from one device arrive in sequence. Critical for IoT applications where out-of-order readings produce garbage results.

Session ID: All events within a user session go to the same partition. Useful for session analytics, funnel tracking, and user behavior analysis.

Bad Key Choices and Their Consequences

Country or Region (high skew): If 60% of your users are in one country, 60% of messages go to one partition — a "hot partition." That partition's broker gets overloaded while others sit idle. Throughput tanks.

Boolean flags (extreme skew): Using true/false as a key gives you only 2 possible hash values, so only 2 partitions ever receive data — all others are empty. The topic can't parallelize.

Timestamp with second precision (too many unique keys): Every second gets its own partition assignment. Related events that happen seconds apart go to different partitions. No useful grouping is achieved.

PARTITION DISTRIBUTION ANALYSIS:

Topic: user-activity (6 partitions)

GOOD KEY (user_id, uniformly distributed):
  Partition 0: ████████████ 17% of traffic
  Partition 1: ███████████ 16% of traffic
  Partition 2: █████████████ 18% of traffic
  Partition 3: ████████████ 17% of traffic
  Partition 4: ██████████ 15% of traffic
  Partition 5: ████████████ 17% of traffic
  ✓ Balanced load across all brokers

BAD KEY (country_code, skewed data):
  Partition 0 (US): ████████████████████████████████████ 55% of traffic ← HOT
  Partition 1 (EU): ████████████████ 25% of traffic
  Partition 2 (APAC): ██████████ 15% of traffic
  Partition 3: ██ 3%
  Partition 4: █ 1.5%
  Partition 5: 0.5%
  ✗ One partition overwhelmed. Others idle. Terrible balance.

The Sticky Partitioner: Solving the Small-Message Problem

When messages have null keys (no key provided), the old round-robin partitioner sent each message to the next partition in sequence. This produced very small batches — each batch might have just one or two messages per partition. Small batches are inefficient because the overhead of network calls and disk writes dominates the actual data.

Kafka 2.4+ introduced the sticky partitioner for null-key messages. Instead of round-robining per message, the sticky partitioner fills up a batch for one partition before switching to the next. Larger batches mean higher throughput and lower overhead. After a batch is sent (or after a timeout), the partitioner "sticks" to a new random partition for the next batch.

ROUND-ROBIN PARTITIONER (old behavior):
  msg1 → Partition 0  (batch: 1 message, small)
  msg2 → Partition 1  (batch: 1 message, small)
  msg3 → Partition 2  (batch: 1 message, small)
  msg4 → Partition 0  (batch: 1 message, small)
  Lots of tiny network requests. High overhead.

STICKY PARTITIONER (Kafka 2.4+ default for null keys):
  msg1 → Partition 0  ←─ sticking to P0
  msg2 → Partition 0  ←─ sticking to P0
  msg3 → Partition 0  ←─ sticking to P0
  [batch of 3 messages sent to P0 → efficient!]
  
  msg4 → Partition 1  ←─ switched, now sticking to P1
  msg5 → Partition 1  ←─ sticking to P1
  [batch of 2 messages sent to P1 → efficient!]

Result: Fewer network calls. Better throughput. More efficient batching.

Custom Partitioners

The default hash-based partitioner covers most use cases, but sometimes you need custom routing logic. Kafka allows you to implement a custom partitioner by writing a class that implements the Partitioner interface.

Common reasons to use a custom partitioner:

Geographic routing: Route messages from specific regions to specific partitions to minimize cross-region data movement.

Priority lanes: Send high-priority messages to a dedicated low-occupancy partition for faster processing.

Business logic routing: Route messages differently based on content — for example, orders above $10,000 go to a dedicated partition for special handling.

CUSTOM PARTITIONER EXAMPLE (pseudocode):

class OrderPriorityPartitioner implements Partitioner {
  
  partition(topic, key, keyBytes, value, valueBytes, cluster):
    
    order = deserialize(value)
    
    if order.amount > 10000:
      return 0   // High-value orders always go to partition 0
    
    if order.isPriority:
      return 1   // Priority orders go to partition 1
    
    // Regular orders distributed across remaining partitions
    return hash(key) % (numPartitions - 2) + 2

}

RESULT TOPOLOGY:
  Partition 0: HIGH-VALUE orders (>$10k)
  Partition 1: PRIORITY orders
  Partition 2-N: Regular orders, balanced

The Partition Count Change Problem

This is a critical trap that catches many Kafka users. The hash-based partitioner computes:

partition = hash(key) % num_partitions

If you increase the number of partitions from 6 to 12, the formula changes. Keys that previously mapped to partition 0 might now map to partition 7. The same key no longer guarantees the same partition. Historical messages with that key are still in the old partition. New messages go to a different partition. Ordering for that key is broken across the partition change boundary.

PARTITION COUNT CHANGE BREAKS ORDERING:

Before change (6 partitions):
  key="order_100" → hash % 6 = 2 → Partition 2
  Messages 1-1000 for order_100 are in Partition 2

After increasing to 12 partitions:
  key="order_100" → hash % 12 = 8 → Partition 8  ← DIFFERENT!
  Messages 1001+ for order_100 go to Partition 8

Consumer processing order_100:
  Partition 2: [msg1][msg2]...[msg1000]
  Partition 8: [msg1001][msg1002]...

If consumers read these partitions independently (which they do),
the temporal ordering guarantee for order_100 is broken.

RULE: Plan your partition count carefully before going live.
      Never change partition counts on topics where key-ordering matters.

Compacted Topics and Keys

In compacted topics (introduced in Topic 6), the key takes on special importance. Log compaction keeps only the most recent message for each key. If you send multiple messages with the same key, older ones are eventually deleted, keeping only the latest value. This makes keys mandatory for compacted topics — without a key, compaction cannot determine which messages are "newer versions" of the same entity.

A null-value message with a key acts as a tombstone — it tells Kafka to delete all records for that key during compaction. This is how you delete entries from a compacted topic.

COMPACTED TOPIC WITH KEYS:

Initial state:
  [user_1: "name=Alice, email=alice@old.com"]
  [user_2: "name=Bob, email=bob@example.com"]
  [user_1: "name=Alice, email=alice@new.com"]  ← newer update
  [user_3: "name=Carol, email=carol@example.com"]
  [user_2: null]  ← tombstone: delete user_2

After compaction:
  [user_1: "name=Alice, email=alice@new.com"]   ← only latest kept
  [user_3: "name=Carol, email=carol@example.com"]
  (user_2 deleted via tombstone)
  
Reading the compacted topic now gives you current state, not history.

Key Points

  • Message keys are optional byte arrays that determine which partition a message goes to. The same key always maps to the same partition.
  • Kafka uses MurmurHash2 on the key, modulo the partition count, to assign partitions. This mapping is deterministic and consistent.
  • Without a key (null), Kafka uses the sticky partitioner (Kafka 2.4+) to fill batches for one partition before switching, maximizing batching efficiency.
  • Keys ensure ordering: all messages with the same key arrive in the same partition in arrival order. This is critical for bank transactions, device streams, and entity state machines.
  • Bad key choices (low cardinality, skewed distribution) create hot partitions. Choose keys with high cardinality and uniform distribution.
  • Increasing partition count breaks key-to-partition mapping for existing keys. Plan partition counts carefully — changes are destructive to key-ordering guarantees.
  • In compacted topics, keys identify which records to keep and which to compact away. Null-value messages with a key act as tombstones that delete records.

Leave a Comment