Kafka with KRaft Mode Running Kafka Without ZooKeeper

Apache Kafka ran on Apache ZooKeeper for the first decade of its existence. ZooKeeper managed cluster metadata, broker registration, leader election, and configuration storage. While this worked, it added operational complexity — every Kafka cluster needed a separate ZooKeeper ensemble to manage, monitor, and maintain. KRaft (Kafka Raft Metadata mode) eliminates this dependency entirely by building cluster coordination directly into Kafka itself using the Raft consensus protocol.

Why ZooKeeper Had to Go

Running two distributed systems — Kafka and ZooKeeper — for a single data platform created real problems at scale.

Operational complexity: Teams needed expertise in two systems. Monitoring, alerting, capacity planning, and disaster recovery procedures had to cover both. ZooKeeper outages caused Kafka cluster instability even when all Kafka brokers were healthy.

Metadata scalability limits: ZooKeeper stored all cluster metadata — every topic, every partition, every replica assignment — in an in-memory tree structure. At very large scales (hundreds of thousands of partitions), ZooKeeper became a bottleneck. Broker startups and leader elections slowed dramatically because brokers had to load all metadata from ZooKeeper on startup.

Leader election speed: ZooKeeper-based leader elections for partition leadership changes took seconds because of ZooKeeper's watch notification and session timeout mechanisms. During this window, the affected partition was unavailable.

Split-brain risk: ZooKeeper required a majority quorum to operate. In network partition scenarios, a minority ZooKeeper partition would stop serving requests, blocking Kafka controller operations even if Kafka brokers were healthy within their own network segment.

ZOOKEEPER-BASED KAFKA (before KRaft):

External ZooKeeper Cluster:   ZK1, ZK2, ZK3 (separate machines to manage)
  ↕ stores metadata, manages leader election
Kafka Broker Cluster:         B1, B2, B3, B4, B5

Problems:
  → 2 distributed systems to operate
  → ZooKeeper bottleneck at large partition counts
  → Slow leader elections (seconds)
  → ZooKeeper team knowledge separate from Kafka team knowledge

KRAFT-BASED KAFKA (modern):

Kafka Controller Cluster:  B1, B2, B3 (KRaft controllers, inside Kafka)
Kafka Broker Cluster:      B4, B5, B6, B7, B8 (data brokers)
  OR
Combined:                  B1, B2, B3 (both controller and broker roles)

Benefits:
  → 1 system to operate
  → Metadata scales to millions of partitions
  → Sub-second leader elections
  → Simpler architecture and operations

The Raft Consensus Protocol in Brief

KRaft uses the Raft distributed consensus algorithm to manage cluster metadata. Raft is designed to be understandable and correct — it is widely used in distributed databases, coordination services, and storage systems.

In Raft, a cluster of nodes elects one leader. The leader handles all writes. Followers replicate all writes from the leader. If the leader fails, a new election selects a new leader from the followers. A write is committed only after a majority (quorum) of nodes confirms it. This majority rule guarantees no data is lost even when the minority of nodes fails.

RAFT LEADER ELECTION IN KRAFT:

KRaft controller cluster: 3 nodes (quorum = 2 required for commit)

Normal operation:
  Controller 1 (Active Leader) → handles all metadata writes
  Controller 2 (Follower) → replicates metadata from C1
  Controller 3 (Follower) → replicates metadata from C1

Controller 1 crashes:

  Controller 2 and C3 detect missing heartbeats from C1.
  C2 starts election: "I am a candidate. Vote for me."
  C3 votes for C2 (C2's log is at least as current as C3's).
  C2 wins election (has majority: 2 out of 3 remaining).
  C2 becomes new Active Leader.

  Total time: typically 100-300 milliseconds.
  (vs. ZooKeeper-based: typically 5-30 seconds)

Metadata writes resume immediately after C2 becomes leader.
Data brokers update their cache from the new controller.

KRaft Architecture: Roles and Modes

In a KRaft cluster, every Kafka process has a configured role that determines what responsibilities it holds. A process can be a controller, a broker, or both.

Controller Role

Controllers form the Raft quorum. They store and replicate all cluster metadata: the list of topics, partition counts, replica assignments, partition leader assignments, configuration overrides, and ACLs. The active controller (Raft leader) is the only controller that handles writes. Follower controllers replicate from the active controller and are ready to take over as leader instantly.

Controllers do not serve producer or consumer data traffic. In large production clusters, dedicated controller nodes focus entirely on metadata management, giving data brokers the full resources of each machine.

Broker Role

Brokers store partition data and serve producer and consumer requests — the traditional Kafka broker job. In KRaft, brokers cache cluster metadata locally. They receive metadata updates pushed from the active controller and maintain a current view of the cluster state without querying a central metadata service for every request.

Combined Role (Small Clusters)

In small deployments (development, single-region low-scale production), a node can act as both controller and broker simultaneously. This reduces the minimum server count needed for a functional cluster. Three nodes in combined mode give you a 3-node quorum controller cluster and a 3-node broker cluster on the same machines.

DEPLOYMENT TOPOLOGY OPTIONS:

OPTION A: COMBINED MODE (small clusters, ≤ a few dozen brokers)
  Node 1: roles=controller,broker
  Node 2: roles=controller,broker
  Node 3: roles=controller,broker

  Benefits: Fewer machines. Simple setup.
  Limits: Controller metadata operations compete with broker data operations.

OPTION B: ISOLATED MODE (large production clusters)
  Controller 1: roles=controller   (dedicated, no data serving)
  Controller 2: roles=controller
  Controller 3: roles=controller
  Broker 4:  roles=broker          (dedicated data serving)
  Broker 5:  roles=broker
  ...
  Broker N:  roles=broker

  Benefits: Controllers don't compete with data traffic.
            Scale controllers and brokers independently.
  Use when: > 30 brokers, heavy metadata churn, strict latency SLAs.

The Metadata Log: Kafka's New Source of Truth

In KRaft, all cluster metadata is stored in a special internal Kafka topic called the metadata log. This topic exists only on controller nodes and uses the Raft protocol for replication (not the standard Kafka replication). The metadata log is an append-only event log — every cluster state change is a record appended to this log.

METADATA LOG EVENTS (examples):

offset 0:  TopicRecord {name: "orders", id: "abc123", ...}
offset 1:  PartitionRecord {topicId: "abc123", partition: 0, leader: 1, ISR: [1,2,3]}
offset 2:  PartitionRecord {topicId: "abc123", partition: 1, leader: 2, ISR: [2,3,1]}
offset 3:  ConfigRecord {resource: TOPIC, name: "orders", key: "retention.ms", value: "86400000"}
offset 4:  BrokerRegistrationRecord {brokerId: 4, endpoints: [...], epoch: 1}
offset 5:  PartitionChangeRecord {topicId: "abc123", partition: 0, leader: 3}  ← leader change
...

BROKER LOCAL METADATA CACHE:
  Each broker maintains a local copy of current cluster state.
  Broker reads the metadata log from controllers (like a consumer reads a topic).
  Broker applies each metadata record to update its local view.
  Broker offset in metadata log = how current the broker's view is.

No ZooKeeper needed. No separate metadata service needed.
The metadata log IS the cluster state.

Setting Up Kafka in KRaft Mode: Step by Step

STEP 1: Generate a Cluster UUID
  A unique ID that all nodes in this cluster share.
  
  KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
  echo $KAFKA_CLUSTER_ID
  # Output: MkU3OEVBNTcwNTJENDM2Qg

STEP 2: Configure Each Node (server.properties for KRaft)

  # Role: this node is both controller and broker (combined mode)
  process.roles=broker,controller
  node.id=1                          ← unique integer per node
  
  # Controller quorum: list of all controller nodes
  controller.quorum.voters=1@localhost:9093,2@host2:9093,3@host3:9093
  
  # Listeners
  listeners=PLAINTEXT://:9092,CONTROLLER://:9093
  inter.broker.listener.name=PLAINTEXT
  controller.listener.names=CONTROLLER
  advertised.listeners=PLAINTEXT://localhost:9092
  
  # Data directories
  log.dirs=/var/kafka/data

STEP 3: Format the Storage Directory on Each Node
  
  bin/kafka-storage.sh format \
    -t $KAFKA_CLUSTER_ID \
    -c config/kraft/server.properties
    
  Output: Formatting /var/kafka/data with metadata.version 3.7-IV4

STEP 4: Start Each Node
  
  bin/kafka-server-start.sh config/kraft/server.properties
  
  # Or as a system service:
  systemctl start kafka

STEP 5: Verify Cluster Health
  
  bin/kafka-metadata-quorum.sh \
    --bootstrap-server localhost:9092 describe --status
    
  Output:
    ClusterId:              MkU3OEVBNTcwNTJENDM2Qg
    LeaderId:               1
    LeaderEpoch:            3
    HighWatermark:          10024
    MaxFollowerLag:         0
    MaxFollowerLagTimeMs:   -1
    CurrentVoters:          [1,2,3]
    CurrentObservers:       []

KRaft Configuration Properties Reference

CRITICAL KRAFT CONFIGURATION PROPERTIES:

process.roles
  Values: broker | controller | broker,controller
  Required: Yes. Defines this node's responsibilities.

node.id
  Type: integer, unique per node in the cluster
  Required: Yes. Replaces broker.id in KRaft mode.

controller.quorum.voters
  Format: id1@host1:port1,id2@host2:port2,id3@host3:port3
  Required: Yes. Lists all controller nodes and their ports.
  Rule: Must be the same on ALL nodes in the cluster.

controller.listener.names
  Default: CONTROLLER
  The listener name used for controller-to-controller and broker-to-controller communication.

metadata.log.dir
  Default: first directory in log.dirs
  Where the metadata log is stored. Keep on a fast, dedicated disk.

metadata.max.idle.interval.ms
  Default: 500ms
  How often the active controller writes a no-op heartbeat to the metadata log.
  Keeps all brokers' metadata cache up to date even during quiet periods.

Migrating from ZooKeeper to KRaft

Kafka 3.x provides a migration path from ZooKeeper-based clusters to KRaft. The migration is a multi-step process that runs both ZooKeeper and KRaft simultaneously during the transition, then gradually moves metadata management to KRaft controllers, and finally removes ZooKeeper entirely.

MIGRATION OVERVIEW (Kafka 3.5+):

Phase 1: Deploy KRaft controllers alongside existing ZooKeeper cluster
  → ZK still manages metadata. KRaft controllers are deployed but inactive.

Phase 2: Enable dual-write mode
  → Kafka controller writes metadata to BOTH ZooKeeper AND KRaft metadata log.
  → KRaft controllers receive all metadata.

Phase 3: Transfer leadership to KRaft
  → KRaft active controller takes over from ZooKeeper controller.
  → ZooKeeper is now read-only reference, no longer active.

Phase 4: Migrate brokers to KRaft-only mode
  → Each broker restarted in KRaft mode (one at a time, rolling restart).
  → No downtime. Rolling restart keeps cluster operational throughout.

Phase 5: Remove ZooKeeper
  → ZooKeeper ensemble decommissioned.
  → Pure KRaft cluster operational.

Total migration time: hours for typical production clusters.
No data loss. No downtime if done with rolling restarts.

KRaft vs ZooKeeper: The Key Differences

COMPARISON TABLE:

Aspect                    ZooKeeper Mode          KRaft Mode
──────────────────────────────────────────────────────────────────
Separate service needed   Yes (ZooKeeper)         No
Minimum nodes (Kafka)     3 Kafka + 3 ZK = 6     3 (combined mode)
Leader election speed     5-30 seconds            100-300 ms
Max stable partitions     ~200,000                Millions (tested at 3M+)
Metadata storage          ZooKeeper znodes        Kafka metadata log
Startup time (large)      Slow (loads from ZK)    Fast (local metadata log)
Supported since           Original Kafka          Kafka 2.8 (preview)
                                                   Kafka 3.3 (stable default)
                                                   Kafka 3.5 (production ready)
Future support            Deprecated              All future development

When to Use KRaft Today

As of Kafka 3.5+, KRaft mode is production-ready and recommended for all new deployments. ZooKeeper mode is still functional but is no longer receiving new feature development. Confluent's managed Kafka service (Confluent Cloud) runs on KRaft. The Apache Kafka community's roadmap targets removing ZooKeeper support in a future major version.

For existing ZooKeeper-based clusters, plan a migration during your next major version upgrade cycle. For all new Kafka deployments, start with KRaft from day one.

Key Points

KRaft eliminates the ZooKeeper dependency by building cluster coordination directly into Kafka using the Raft consensus protocol.
KRaft controllers form a Raft quorum that stores and replicates all cluster metadata in an internal metadata log (a special Kafka topic).
Nodes can have three roles: controller only, broker only, or combined (both). Combined mode simplifies small cluster setups.
KRaft reduces leader election time from seconds to milliseconds and scales to millions of partitions versus ZooKeeper's ~200,000 limit.
Setting up KRaft requires: generating a cluster UUID, formatting storage, configuring process.roles, node.id, and controller.quorum.voters, then starting the brokers.
KRaft is production-ready in Kafka 3.3+ and the recommended mode for all new Kafka deployments. ZooKeeper mode is deprecated.

Previous lessons

Back to courses

Next lessons