Kafka Producer Acknowledgements and Delivery Guarantees

Every distributed system that moves data between machines faces a fundamental challenge: how do you know the data actually arrived? Networks drop packets. Servers crash mid-write. Processes run out of memory. Kafka's acknowledgement system and delivery guarantees give you explicit control over the trade-off between speed and safety. Understanding these guarantees lets you design systems that behave correctly even when individual components fail.

The Three Delivery Semantics

All messaging systems — not just Kafka — can provide one of three delivery semantics. Each semantic makes a different promise about what happens to messages when failures occur.

At-Most-Once Delivery

Every message is delivered zero or one time. Messages might be lost, but they are never delivered twice. The producer sends the message and moves on without waiting for confirmation. If the broker crashes before storing it, the message is gone forever. Speed is maximized. Data integrity is not guaranteed.

AT-MOST-ONCE:

Producer:                     Broker:
send("order_100")  ──────→   receives order_100
[moves on immediately]         CRASH before storing!

Result: order_100 is LOST. Producer does not retry.
Never duplicated, but potentially lost.

Achieved with: acks=0 (no acknowledgement)
Use when: Occasional loss is acceptable. Speed is paramount.
Examples: Real-time analytics, click tracking, sensor telemetry.

At-Least-Once Delivery

Every message is delivered one or more times. No message is ever lost, but some messages might be delivered multiple times (duplicates). The producer waits for acknowledgement and retries if it doesn't arrive. If the broker stores the message but the ACK is lost in the network, the producer retries and the broker stores a second copy.

AT-LEAST-ONCE:

Producer:                     Broker:
send("order_100") ──────→    receives order_100
                              stores order_100 at offset 5
                              sends ACK ──────→ [ACK lost in network]
[timeout! No ACK received]
retries send("order_100") → receives order_100 again
                              stores DUPLICATE at offset 6
                              sends ACK ──────→ [received!]

Result: order_100 stored TWICE at offsets 5 and 6. Duplicate!
Never lost, but might be duplicated.

Achieved with: acks=1 or acks=all + retries > 0
Use when: You can handle or deduplicate duplicates downstream.
Examples: Most event streaming use cases with idempotent consumers.

Exactly-Once Delivery

Every message is delivered exactly one time — never lost, never duplicated. This is the gold standard and the most complex to achieve. Kafka provides exactly-once semantics through a combination of idempotent producers and transactions.

EXACTLY-ONCE:

Producer:                     Broker:
send("order_100",             receives order_100
     PID=42, seq=100) ──→    stores at offset 5
                              sends ACK ──────→ [ACK lost]
[timeout! No ACK received]
retries send("order_100",
     PID=42, seq=100) ──→    "Already saw seq=100 from PID=42!"
                              DISCARDS duplicate
                              resends ACK for offset 5 ──→ [received!]

Result: order_100 stored ONCE at offset 5. Perfect.
Never lost. Never duplicated.

Achieved with: enable.idempotence=true + transactions (for cross-partition)
Use when: Data correctness is critical. Financial systems, order processing.

Acknowledgement Levels in Detail

The acks producer configuration is the primary lever for controlling the trade-off between throughput and durability. Revisiting it in deeper detail here shows exactly what happens at each level.

acks=0: Fire and Forget

The producer sends the message to the broker's network buffer and immediately considers it sent. The producer client doesn't even open a confirmation channel for this message. The broker's response (if any) is ignored.

This delivers the absolute highest throughput possible — the producer's sending thread is never waiting for responses. It is appropriate only when occasional message loss is acceptable and speed is the primary concern.

acks=0 TIMELINE:

t=0ms:  Producer sends message → immediately moves to next message
t=1ms:  Producer sends next message
t=2ms:  Producer sends next message
...
Broker processes at its own pace. Producer never waits.

Throughput: Maximum possible (no waiting)
Loss risk:  High (network failure or broker crash = silent loss)
Duplicates: Never (no retries)

acks=1: Leader Acknowledgement

The leader broker confirms it has written the message to its local log. The producer waits for this ACK before considering the send successful. Replicas are not involved in the ACK — the leader confirms before replication completes.

The hidden risk: the leader writes the message and sends the ACK, but then crashes before any replica has copied the message. The new elected leader (a former replica) has no knowledge of this message — it was only in the original leader's memory/log. The message is permanently lost even though the producer received a success ACK.

acks=1 FAILURE SCENARIO:

t=0ms:  Producer sends message
t=1ms:  Leader (Broker 2) writes to local log
t=2ms:  Leader sends ACK → Producer: "Success!"
t=3ms:  BROKER 2 CRASHES (before Broker 1 or 3 replicated the message)
t=4ms:  Broker 1 elected as new leader (its log doesn't have this message)

Result: Producer thinks message was delivered.
        New leader has no record of it. MESSAGE LOST.

Loss risk:  Low but non-zero (leader-before-replication crash)
Throughput: High (only one broker's confirmation needed)

acks=all (or acks=-1): Full Replica Acknowledgement

The leader waits until all brokers in the In-Sync Replica set (ISR) have written the message before sending the ACK. Even if the leader crashes after the ACK, at least one replica has the message and can serve as the new leader without data loss.

The strength of acks=all depends on the ISR size, which is controlled by min.insync.replicas (minISR). This broker/topic-level setting specifies the minimum number of replicas (including the leader) that must be in the ISR for a write to be accepted.

acks=all WORKING CORRECTLY:

Cluster: 3 brokers. Replication factor = 3. min.insync.replicas = 2.

t=0ms:  Producer sends message
t=1ms:  Leader (Broker 1) writes to log
t=2ms:  Broker 1 replicates to Broker 2. Broker 2 writes and confirms.
        ISR = {B1, B2, B3}. But B3 slightly behind.
        2 replicas (B1 + B2) have confirmed → minISR=2 satisfied
t=3ms:  Leader sends ACK → Producer: "Success!"
t=4ms:  Broker 1 CRASHES

t=5ms:  Broker 2 elected as new leader (has the message)
        Message safely delivered. No loss.

min.insync.replicas = 2 → at least 2 replicas must confirm
With RF=3 and minISR=2: tolerates 1 replica lag without blocking writes
With RF=3 and minISR=3: all 3 replicas must confirm; if any is slow → writes block

What Happens When minISR Is Not Met

If the number of in-sync replicas falls below min.insync.replicas (due to broker failures), the leader refuses new writes with a NotEnoughReplicasException. This is correct behavior — it is safer to refuse writes than to lose data. Producers must handle this exception and wait or alert operations.

NOT-ENOUGH-REPLICAS SCENARIO:

Cluster: 3 brokers. min.insync.replicas = 2.
Brokers 2 and 3 both crash. Only Broker 1 remains.

ISR = {B1} only. 1 < minISR=2.

Producer tries to write → Broker 1 rejects: NotEnoughReplicasException
Producer retries → still rejected
...
Operator brings Broker 2 back online.
ISR = {B1, B2}. 2 >= minISR=2. Writes resume.

This is Kafka protecting data integrity over availability.

Idempotent Producer: Eliminating Duplicates on Retry

The idempotent producer (enabled with enable.idempotence=true) eliminates duplicate writes caused by producer retries. It works by assigning each producer a unique Producer ID (PID) and tagging each message with a per-partition sequence number. The broker tracks the last sequence number received per PID per partition.

IDEMPOTENT PRODUCER SEQUENCE NUMBERS:

Producer PID=42 sends to orders-partition-0:

Send 1: PID=42, seq=0, message="order_A" → stored at offset 100 ✓
Send 2: PID=42, seq=1, message="order_B" → stored at offset 101 ✓
Send 3: PID=42, seq=2, message="order_C" → stored at offset 102 ✓
[ACK for seq=2 lost in network]

Producer retries:
Send 3 retry: PID=42, seq=2, message="order_C"
Broker: "Last seq from PID=42 on partition 0 was seq=2. Duplicate!"
Broker discards, resends ACK for offset 102.

No duplicate in the log. Sequence numbers maintain deduplication.

What if seq arrives out of order?
  Expected seq=3, got seq=5: Broker rejects with OutOfOrderSequence error.
  The producer stops and raises an error (data integrity violation).

Kafka Transactions: Exactly-Once Across Multiple Partitions

Idempotent producers guarantee exactly-once delivery to a single partition. But what if your producer writes to multiple partitions (or multiple topics) and you need all writes to succeed atomically — either all happen or none do?

Kafka transactions solve this. A transactional producer wraps multiple writes in a transaction. Either all writes commit (become visible to consumers) or all are aborted. Consumers configured with isolation.level=read_committed see only committed data — in-progress transactions are invisible to them.

KAFKA TRANSACTION PATTERN (Java pseudocode):

producer.initTransactions()

while True:
  try:
    producer.beginTransaction()
    
    producer.send("orders", key="order_1", value="created")
    producer.send("inventory", key="item_A", value="reserved")
    producer.send("analytics", key="event", value="order_created")
    
    producer.commitTransaction()   ← all 3 writes become visible atomically
    
  except Exception as e:
    producer.abortTransaction()    ← all 3 writes are cancelled, as if never sent
    
Consumer with isolation.level=read_committed sees EITHER:
  All 3 messages (if committed)
  OR none of them (if aborted or in-progress)

Consumer with isolation.level=read_uncommitted sees everything including
  in-progress transactions (may see partial writes)

Choosing the Right Delivery Guarantee

DECISION GUIDE:

Use case: Click tracking, page views, metrics collection
  Tolerance for loss: Yes   Tolerance for duplicates: Yes
  Choice: acks=0, no retries
  Reason: Speed matters, occasional loss is fine, consumer aggregates anyway

Use case: Application event logs, user activity
  Tolerance for loss: No    Tolerance for duplicates: Yes (idempotent consumer)
  Choice: acks=1 or acks=all, retries enabled
  Reason: No data loss, consumer deduplicates on its side

Use case: Financial transactions, order lifecycle, inventory updates
  Tolerance for loss: No    Tolerance for duplicates: No
  Choice: enable.idempotence=true, acks=all, min.insync.replicas=2
  Reason: Exactly-once required, correctness over speed

Use case: Atomic writes spanning multiple topics (order + inventory + analytics)
  Tolerance for loss: No    Tolerance for duplicates: No   Must be atomic: Yes
  Choice: Kafka transactions + read_committed consumers
  Reason: All-or-nothing across multiple topics required

Key Points

  • Three delivery semantics: at-most-once (may lose, no duplicates), at-least-once (no loss, may duplicate), exactly-once (no loss, no duplicates).
  • acks=0: no ACK, fastest, at-most-once. acks=1: leader ACK, balanced, risk of loss on leader crash. acks=all: ISR ACK, safest, strongest durability guarantee.
  • min.insync.replicas must be set alongside acks=all. Recommended: RF=3, minISR=2 for most production workloads.
  • Idempotent producers (enable.idempotence=true) eliminate duplicates from retries using PID and sequence numbers per partition.
  • Kafka transactions provide atomic, exactly-once writes across multiple partitions and topics. Consumers must use isolation.level=read_committed to benefit.

Leave a Comment