Kafka with KRaft Mode Running Kafka Without ZooKeeper
Apache Kafka ran on Apache ZooKeeper for the first decade of its existence. ZooKeeper managed cluster metadata, broker registration, leader election, and configuration storage. While this worked, it added operational complexity — every Kafka cluster needed a separate ZooKeeper ensemble to manage, monitor, and maintain. KRaft (Kafka Raft Metadata mode) eliminates this dependency entirely by building cluster coordination directly into Kafka itself using the Raft consensus protocol.
Why ZooKeeper Had to Go
Running two distributed systems — Kafka and ZooKeeper — for a single data platform created real problems at scale.
Operational complexity: Teams needed expertise in two systems. Monitoring, alerting, capacity planning, and disaster recovery procedures had to cover both. ZooKeeper outages caused Kafka cluster instability even when all Kafka brokers were healthy.
Metadata scalability limits: ZooKeeper stored all cluster metadata — every topic, every partition, every replica assignment — in an in-memory tree structure. At very large scales (hundreds of thousands of partitions), ZooKeeper became a bottleneck. Broker startups and leader elections slowed dramatically because brokers had to load all metadata from ZooKeeper on startup.
Leader election speed: ZooKeeper-based leader elections for partition leadership changes took seconds because of ZooKeeper's watch notification and session timeout mechanisms. During this window, the affected partition was unavailable.
Split-brain risk: ZooKeeper required a majority quorum to operate. In network partition scenarios, a minority ZooKeeper partition would stop serving requests, blocking Kafka controller operations even if Kafka brokers were healthy within their own network segment.
ZOOKEEPER-BASED KAFKA (before KRaft): External ZooKeeper Cluster: ZK1, ZK2, ZK3 (separate machines to manage) ↕ stores metadata, manages leader election Kafka Broker Cluster: B1, B2, B3, B4, B5 Problems: → 2 distributed systems to operate → ZooKeeper bottleneck at large partition counts → Slow leader elections (seconds) → ZooKeeper team knowledge separate from Kafka team knowledge KRAFT-BASED KAFKA (modern): Kafka Controller Cluster: B1, B2, B3 (KRaft controllers, inside Kafka) Kafka Broker Cluster: B4, B5, B6, B7, B8 (data brokers) OR Combined: B1, B2, B3 (both controller and broker roles) Benefits: → 1 system to operate → Metadata scales to millions of partitions → Sub-second leader elections → Simpler architecture and operations
The Raft Consensus Protocol in Brief
KRaft uses the Raft distributed consensus algorithm to manage cluster metadata. Raft is designed to be understandable and correct — it is widely used in distributed databases, coordination services, and storage systems.
In Raft, a cluster of nodes elects one leader. The leader handles all writes. Followers replicate all writes from the leader. If the leader fails, a new election selects a new leader from the followers. A write is committed only after a majority (quorum) of nodes confirms it. This majority rule guarantees no data is lost even when the minority of nodes fails.
RAFT LEADER ELECTION IN KRAFT: KRaft controller cluster: 3 nodes (quorum = 2 required for commit) Normal operation: Controller 1 (Active Leader) → handles all metadata writes Controller 2 (Follower) → replicates metadata from C1 Controller 3 (Follower) → replicates metadata from C1 Controller 1 crashes: Controller 2 and C3 detect missing heartbeats from C1. C2 starts election: "I am a candidate. Vote for me." C3 votes for C2 (C2's log is at least as current as C3's). C2 wins election (has majority: 2 out of 3 remaining). C2 becomes new Active Leader. Total time: typically 100-300 milliseconds. (vs. ZooKeeper-based: typically 5-30 seconds) Metadata writes resume immediately after C2 becomes leader. Data brokers update their cache from the new controller.
KRaft Architecture: Roles and Modes
In a KRaft cluster, every Kafka process has a configured role that determines what responsibilities it holds. A process can be a controller, a broker, or both.
Controller Role
Controllers form the Raft quorum. They store and replicate all cluster metadata: the list of topics, partition counts, replica assignments, partition leader assignments, configuration overrides, and ACLs. The active controller (Raft leader) is the only controller that handles writes. Follower controllers replicate from the active controller and are ready to take over as leader instantly.
Controllers do not serve producer or consumer data traffic. In large production clusters, dedicated controller nodes focus entirely on metadata management, giving data brokers the full resources of each machine.
Broker Role
Brokers store partition data and serve producer and consumer requests — the traditional Kafka broker job. In KRaft, brokers cache cluster metadata locally. They receive metadata updates pushed from the active controller and maintain a current view of the cluster state without querying a central metadata service for every request.
Combined Role (Small Clusters)
In small deployments (development, single-region low-scale production), a node can act as both controller and broker simultaneously. This reduces the minimum server count needed for a functional cluster. Three nodes in combined mode give you a 3-node quorum controller cluster and a 3-node broker cluster on the same machines.
DEPLOYMENT TOPOLOGY OPTIONS:
OPTION A: COMBINED MODE (small clusters, ≤ a few dozen brokers)
Node 1: roles=controller,broker
Node 2: roles=controller,broker
Node 3: roles=controller,broker
Benefits: Fewer machines. Simple setup.
Limits: Controller metadata operations compete with broker data operations.
OPTION B: ISOLATED MODE (large production clusters)
Controller 1: roles=controller (dedicated, no data serving)
Controller 2: roles=controller
Controller 3: roles=controller
Broker 4: roles=broker (dedicated data serving)
Broker 5: roles=broker
...
Broker N: roles=broker
Benefits: Controllers don't compete with data traffic.
Scale controllers and brokers independently.
Use when: > 30 brokers, heavy metadata churn, strict latency SLAs.
The Metadata Log: Kafka's New Source of Truth
In KRaft, all cluster metadata is stored in a special internal Kafka topic called the metadata log. This topic exists only on controller nodes and uses the Raft protocol for replication (not the standard Kafka replication). The metadata log is an append-only event log — every cluster state change is a record appended to this log.
METADATA LOG EVENTS (examples):
offset 0: TopicRecord {name: "orders", id: "abc123", ...}
offset 1: PartitionRecord {topicId: "abc123", partition: 0, leader: 1, ISR: [1,2,3]}
offset 2: PartitionRecord {topicId: "abc123", partition: 1, leader: 2, ISR: [2,3,1]}
offset 3: ConfigRecord {resource: TOPIC, name: "orders", key: "retention.ms", value: "86400000"}
offset 4: BrokerRegistrationRecord {brokerId: 4, endpoints: [...], epoch: 1}
offset 5: PartitionChangeRecord {topicId: "abc123", partition: 0, leader: 3} ← leader change
...
BROKER LOCAL METADATA CACHE:
Each broker maintains a local copy of current cluster state.
Broker reads the metadata log from controllers (like a consumer reads a topic).
Broker applies each metadata record to update its local view.
Broker offset in metadata log = how current the broker's view is.
No ZooKeeper needed. No separate metadata service needed.
The metadata log IS the cluster state.
Setting Up Kafka in KRaft Mode: Step by Step
STEP 1: Generate a Cluster UUID
A unique ID that all nodes in this cluster share.
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
echo $KAFKA_CLUSTER_ID
# Output: MkU3OEVBNTcwNTJENDM2Qg
STEP 2: Configure Each Node (server.properties for KRaft)
# Role: this node is both controller and broker (combined mode)
process.roles=broker,controller
node.id=1 ← unique integer per node
# Controller quorum: list of all controller nodes
controller.quorum.voters=1@localhost:9093,2@host2:9093,3@host3:9093
# Listeners
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
advertised.listeners=PLAINTEXT://localhost:9092
# Data directories
log.dirs=/var/kafka/data
STEP 3: Format the Storage Directory on Each Node
bin/kafka-storage.sh format \
-t $KAFKA_CLUSTER_ID \
-c config/kraft/server.properties
Output: Formatting /var/kafka/data with metadata.version 3.7-IV4
STEP 4: Start Each Node
bin/kafka-server-start.sh config/kraft/server.properties
# Or as a system service:
systemctl start kafka
STEP 5: Verify Cluster Health
bin/kafka-metadata-quorum.sh \
--bootstrap-server localhost:9092 describe --status
Output:
ClusterId: MkU3OEVBNTcwNTJENDM2Qg
LeaderId: 1
LeaderEpoch: 3
HighWatermark: 10024
MaxFollowerLag: 0
MaxFollowerLagTimeMs: -1
CurrentVoters: [1,2,3]
CurrentObservers: []
KRaft Configuration Properties Reference
CRITICAL KRAFT CONFIGURATION PROPERTIES: process.roles Values: broker | controller | broker,controller Required: Yes. Defines this node's responsibilities. node.id Type: integer, unique per node in the cluster Required: Yes. Replaces broker.id in KRaft mode. controller.quorum.voters Format: id1@host1:port1,id2@host2:port2,id3@host3:port3 Required: Yes. Lists all controller nodes and their ports. Rule: Must be the same on ALL nodes in the cluster. controller.listener.names Default: CONTROLLER The listener name used for controller-to-controller and broker-to-controller communication. metadata.log.dir Default: first directory in log.dirs Where the metadata log is stored. Keep on a fast, dedicated disk. metadata.max.idle.interval.ms Default: 500ms How often the active controller writes a no-op heartbeat to the metadata log. Keeps all brokers' metadata cache up to date even during quiet periods.
Migrating from ZooKeeper to KRaft
Kafka 3.x provides a migration path from ZooKeeper-based clusters to KRaft. The migration is a multi-step process that runs both ZooKeeper and KRaft simultaneously during the transition, then gradually moves metadata management to KRaft controllers, and finally removes ZooKeeper entirely.
MIGRATION OVERVIEW (Kafka 3.5+): Phase 1: Deploy KRaft controllers alongside existing ZooKeeper cluster → ZK still manages metadata. KRaft controllers are deployed but inactive. Phase 2: Enable dual-write mode → Kafka controller writes metadata to BOTH ZooKeeper AND KRaft metadata log. → KRaft controllers receive all metadata. Phase 3: Transfer leadership to KRaft → KRaft active controller takes over from ZooKeeper controller. → ZooKeeper is now read-only reference, no longer active. Phase 4: Migrate brokers to KRaft-only mode → Each broker restarted in KRaft mode (one at a time, rolling restart). → No downtime. Rolling restart keeps cluster operational throughout. Phase 5: Remove ZooKeeper → ZooKeeper ensemble decommissioned. → Pure KRaft cluster operational. Total migration time: hours for typical production clusters. No data loss. No downtime if done with rolling restarts.
KRaft vs ZooKeeper: The Key Differences
COMPARISON TABLE:
Aspect ZooKeeper Mode KRaft Mode
──────────────────────────────────────────────────────────────────
Separate service needed Yes (ZooKeeper) No
Minimum nodes (Kafka) 3 Kafka + 3 ZK = 6 3 (combined mode)
Leader election speed 5-30 seconds 100-300 ms
Max stable partitions ~200,000 Millions (tested at 3M+)
Metadata storage ZooKeeper znodes Kafka metadata log
Startup time (large) Slow (loads from ZK) Fast (local metadata log)
Supported since Original Kafka Kafka 2.8 (preview)
Kafka 3.3 (stable default)
Kafka 3.5 (production ready)
Future support Deprecated All future development
When to Use KRaft Today
As of Kafka 3.5+, KRaft mode is production-ready and recommended for all new deployments. ZooKeeper mode is still functional but is no longer receiving new feature development. Confluent's managed Kafka service (Confluent Cloud) runs on KRaft. The Apache Kafka community's roadmap targets removing ZooKeeper support in a future major version.
For existing ZooKeeper-based clusters, plan a migration during your next major version upgrade cycle. For all new Kafka deployments, start with KRaft from day one.
Key Points
- KRaft eliminates the ZooKeeper dependency by building cluster coordination directly into Kafka using the Raft consensus protocol.
- KRaft controllers form a Raft quorum that stores and replicates all cluster metadata in an internal metadata log (a special Kafka topic).
- Nodes can have three roles: controller only, broker only, or combined (both). Combined mode simplifies small cluster setups.
- KRaft reduces leader election time from seconds to milliseconds and scales to millions of partitions versus ZooKeeper's ~200,000 limit.
- Setting up KRaft requires: generating a cluster UUID, formatting storage, configuring process.roles, node.id, and controller.quorum.voters, then starting the brokers.
- KRaft is production-ready in Kafka 3.3+ and the recommended mode for all new Kafka deployments. ZooKeeper mode is deprecated.
