How Kafka Producers Send Messages to Topics

The Kafka producer is the entry point for all data flowing into Kafka. Everything you store in Kafka first passes through a producer — whether it is an application you write, a connector pulling from a database, or an IoT device streaming sensor data. Understanding how producers work lets you tune them for speed, reliability, and efficiency.

What a Kafka Producer Does

A Kafka producer is a client library embedded in your application. It takes the data your application wants to send, serializes it into bytes, determines which partition to route it to, batches messages for efficiency, compresses them if configured, and sends them to the appropriate broker. All of this happens transparently with a simple send() call from your application code.

The Producer's Internal Pipeline

When your application calls producer.send(record), the message does not go directly to the broker. It travels through a series of internal steps inside the producer.

PRODUCER INTERNAL PIPELINE:

Your App: producer.send(ProducerRecord)
           ↓
[1] SERIALIZER: Converts key and value objects to byte arrays
           ↓
[2] PARTITIONER: Determines which partition to route to (hash of key)
           ↓
[3] RECORD ACCUMULATOR (buffer): Holds messages in memory by topic-partition
           ↓   (batches accumulate here until batch.size or linger.ms triggers)
[4] SENDER THREAD: Background thread picks full/ready batches
           ↓
[5] NETWORK CLIENT: Sends batch to the correct broker over TCP
           ↓
[KAFKA BROKER]: Stores message, returns ACK

If ACK received: success callback fired
If error: retry logic engages (up to retries count)

The Record Accumulator Buffer

Between your application and the broker sits the record accumulator — an in-memory buffer organized by topic-partition. Instead of sending each message immediately (inefficient), the producer holds messages in the buffer and sends them in batches. This dramatically reduces the number of network round trips and increases throughput.

Two settings control when a batch leaves the buffer:

batch.size: Maximum bytes in one batch. When a batch reaches this size, it is sent immediately. Default 16 KB.

linger.ms: Maximum time to wait before sending an incomplete batch. Even if the batch isn't full, it gets sent after this many milliseconds. Default 0 ms (send immediately, no waiting). Setting linger.ms to 5-20 ms allows more messages to accumulate per batch, increasing throughput at the cost of slight latency.

BATCHING BEHAVIOR:

linger.ms=0 (default):
  msg1 arrives → batch created → immediately sent (1-message batch, inefficient)
  msg2 arrives → new batch → immediately sent
  100 messages = 100 network calls

linger.ms=10 (10ms wait):
  msg1-msg15 arrive in 10ms window → batched together → sent as 1 call
  msg16-msg34 arrive in next 10ms → sent as 1 call
  100 messages = ~6 network calls → better throughput

Producer Configuration: The Critical Properties

bootstrap.servers

The list of broker addresses the producer uses to discover the cluster. Provide at least two for resilience. Format: broker1:9092,broker2:9092

key.serializer / value.serializer

Classes that convert your message key and value to byte arrays. Common choices: StringSerializer for text, IntegerSerializer for numbers, KafkaAvroSerializer for Avro-encoded data.

acks (Acknowledgement Level)

Controls how many broker replicas must confirm receipt before the producer considers a message successfully sent. This is the single most important reliability setting.

ACK SETTINGS AND THEIR TRADE-OFFS:

acks=0: NO ACKNOWLEDGEMENT
  Producer fires message and forgets.
  Fastest possible: no waiting for any response.
  Risk: if broker crashes immediately after receive, message is LOST.
  Use for: telemetry data where occasional loss is acceptable.

acks=1: LEADER ACKNOWLEDGEMENT (default)
  Leader broker confirms it wrote the message.
  Fast: only waits for leader, not replicas.
  Risk: if leader crashes BEFORE replicas sync, message LOST.
  Use for: most general workloads. Good balance of speed and safety.

acks=all (or -1): ALL IN-SYNC REPLICA ACKNOWLEDGEMENT
  All brokers in the ISR must confirm they wrote the message.
  Slowest: waits for all replicas. No data loss risk.
  Requires: min.insync.replicas ≥ 2 for meaningful protection.
  Use for: financial data, audit logs, any data where loss is unacceptable.

Speed: acks=0 > acks=1 > acks=all
Safety: acks=all > acks=1 > acks=0

retries and retry.backoff.ms

When a send fails with a retriable error (network hiccup, leader election in progress), the producer retries automatically. Default retries = 2147483647 (effectively infinite) in modern Kafka. retry.backoff.ms = 100ms between retries.

max.in.flight.requests.per.connection

How many unacknowledged request batches can be in flight to a single broker simultaneously. Higher values increase throughput. Setting this to 1 guarantees strict ordering (no out-of-order delivery even on retry). For idempotent producers, a maximum of 5 is safe while still maintaining ordering.

Idempotent Producer: Safe Retries Without Duplicates

Network failures cause a subtle problem: the producer sends a message, the broker receives and stores it, but the network fails before the ACK returns. The producer thinks the message was lost and retries — sending the same message again. The broker now has two copies. This is a duplicate.

The idempotent producer solves this. Each producer instance gets a unique Producer ID (PID), and each message gets a sequence number. The broker tracks the last sequence number per PID per partition. If the same sequence number arrives again, the broker discards the duplicate instead of storing it twice.

IDEMPOTENT PRODUCER FLOW:

Producer sends: PID=42, seq=100, message="order_5000 created"
Broker receives, stores, assigns offset 342.
ACK lost in network.

Producer retries: PID=42, seq=100, message="order_5000 created"
Broker sees: "I already stored seq=100 from PID=42. This is a duplicate."
Broker discards duplicate.
Broker sends ACK for original offset 342.

No duplicate in topic. Producer receives success.

Enable with: enable.idempotence=true
(Also sets: acks=all, retries=Integer.MAX_VALUE, max.in.flight=5)

Producer Error Handling

Producer errors fall into two categories. Retriable errors are temporary conditions — leader election in progress, network timeout, broker temporarily unavailable. The producer retries automatically. Non-retriable errors are permanent — message too large, invalid topic, authorization failure. The producer reports the error to the application and does not retry.

For retriable errors, the producer's retry mechanism works transparently. For non-retriable errors, your application code receives an exception via the callback function attached to the send() call.

Key Points

  • Kafka producers serialize messages, determine partition routing, batch messages in the record accumulator, and send batches to brokers.
  • batch.size and linger.ms control batching behavior. Higher linger.ms improves throughput at the cost of latency.
  • acks controls reliability: acks=0 (no guarantee), acks=1 (leader confirms), acks=all (all in-sync replicas confirm).
  • Idempotent producers (enable.idempotence=true) prevent duplicate messages during retries by tracking Producer IDs and sequence numbers.
  • Retriable errors trigger automatic retries. Non-retriable errors surface to the application as exceptions.

Leave a Comment