Understanding Topics and Partitions in Apache Kafka
Topics and partitions are the two most fundamental building blocks of Kafka's data storage. Every message in Kafka lives inside a partition, and every partition belongs to a topic. Getting these two concepts crystal clear gives you a solid foundation for understanding how Kafka stores data, scales throughput, and delivers messages to multiple consumers simultaneously.
What Is a Kafka Topic
A Kafka topic is a named category or feed to which records are published. Think of a topic as a folder in a filing cabinet. You label each folder by the type of documents it holds — "invoices," "employee-records," "project-reports." In Kafka, you create topics like "user-signups," "payment-transactions," "sensor-readings," or "page-views."
Topics are the logical containers for your event streams. Producers decide which topic to write to. Consumers decide which topics to read from. The naming of topics matters — good topic names communicate exactly what type of event they contain.
Topics Are Append-Only
Once a message is written to a topic, it cannot be modified or deleted (until the retention period expires and Kafka automatically removes it). Messages always append to the end. This append-only behavior is what makes Kafka fast — sequential disk writes are much faster than random writes, and they enable consumers to read sequentially rather than jumping around the disk.
TOPIC: payment-transactions ────────────────────────────────────────────────────────────────── Time →→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→ [pay_001] [pay_002] [pay_003] [pay_004] [pay_005] [new →] $50 $120 $8.99 $4,500 $23.00 Offset 0 Offset 1 Offset 2 Offset 3 Offset 4 Rules: No updates. No deletes (until retention). Only appends. Consumers read left-to-right. Order is preserved. ──────────────────────────────────────────────────────────────────
Topic Naming Best Practices
Good topic naming prevents confusion as your Kafka cluster grows. Several widely-used naming conventions exist. The key is picking one convention and staying consistent.
Use dot-separated or hyphen-separated names: Examples: orders.created, payments.processed, or user-activity. Avoid spaces and special characters beyond dots, hyphens, and underscores.
Include the domain and event type: A topic named orders.created is far clearer than orders or events. It tells you what domain it belongs to (orders) and what event it represents (created).
Avoid overly generic names: Topics named events, data, or messages become dumping grounds for unrelated data, making them impossible to consume cleanly.
Consider environment prefixes: Large organizations often prefix topic names with the environment: prod.orders.created, staging.orders.created, dev.orders.created. This prevents development data from contaminating production consumers.
What Is a Kafka Partition
A partition is the physical unit that Kafka actually stores data in. Every topic is divided into one or more partitions. Partitions are what enable Kafka to scale — they allow data to be distributed across multiple brokers and allow multiple consumers to read in parallel.
The Bookshelf Analogy
Imagine a library with thousands of books on a single subject — say, history. Storing all these books on one shelf would create a bottleneck: only one librarian can find and retrieve books from that shelf at a time. Divide the books across 10 shelves in 10 different rooms, and 10 librarians can retrieve books simultaneously. Kafka partitions work exactly this way.
ONE PARTITION (slow, single-threaded):
Topic: orders
Partition 0: [o1][o2][o3][o4][o5][o6][o7][o8][o9][o10]
← one consumer reads sequentially, one at a time
THREE PARTITIONS (fast, parallel):
Topic: orders (3 partitions)
Partition 0: [o1][o4][o7][o10] ← Consumer A reads this
Partition 1: [o2][o5][o8] ← Consumer B reads this
Partition 2: [o3][o6][o9] ← Consumer C reads this
3 consumers work in parallel = 3x throughput
Each Partition Is an Independent Ordered Log
Within a single partition, messages are strictly ordered. Message at offset 0 came before offset 1, offset 1 before offset 2, and so on. This order is guaranteed and permanent.
Across partitions, there is no guaranteed order. A message at offset 5 in partition 0 might have arrived before or after offset 2 in partition 1. If global ordering matters to you, use a single partition — but you sacrifice parallelism. If ordering only matters within a group of related messages, use message keys to route related messages to the same partition (covered in Topic 9).
How Partitions Are Stored on Disk
Each partition on a broker is stored as a directory on disk. Inside the directory, Kafka stores messages in segment files. Each segment has a maximum size (default 1 GB) or maximum age (default 7 days). When a segment fills up, Kafka creates a new one. Old segments get deleted when the retention policy expires.
DISK LAYOUT FOR orders TOPIC (3 partitions on 3 brokers):
BROKER 1: BROKER 2: BROKER 3:
/kafka-data/ /kafka-data/ /kafka-data/
orders-0/ orders-1/ orders-2/
00000000000000000000.log 00000000000000000000.log 00000000000000000000.log
00000000000000000000.index 00000000000000000000.index 00000000000000000000.index
00000000000001048576.log ... ...
00000000000001048576.index
File name = starting offset of that segment.
New segment created when current segment reaches max size.
Old segments deleted when retention period expires.
How Many Partitions Should You Create
Choosing the right number of partitions per topic is one of the most important Kafka design decisions. Too few partitions limit throughput and parallelism. Too many partitions add overhead and slow down leader elections during failures.
Factors That Determine Partition Count
Target throughput: Estimate how many megabytes per second you need to produce and consume. A single partition on modern hardware can typically handle 10–50 MB/s. Divide your target throughput by the per-partition throughput to get an approximate partition count.
Number of consumers: You can only have as many active consumers in a consumer group as there are partitions. If you want 10 consumers in a group reading in parallel, you need at least 10 partitions.
Number of brokers: For best performance, distribute partitions evenly across brokers. A 3-broker cluster with a 9-partition topic gives 3 partitions per broker — perfectly balanced.
Retention and disk usage: More partitions mean more small files on disk. With millions of partitions across a cluster, the file system overhead becomes significant. Do not create partitions you do not need.
PARTITION PLANNING EXAMPLE Requirements: - Target produce throughput: 100 MB/s - Per-partition capacity: 20 MB/s - Desired consumer parallelism: 10 consumers - Number of brokers: 5 Calculation: Partitions for throughput: 100 / 20 = 5 partitions minimum Partitions for consumers: 10 (to match consumer count) Even distribution across 5 brokers: 10 partitions = 2 per broker Decision: Create 10 partitions for this topic.
The Challenge of Changing Partition Count
You can increase the number of partitions for a topic after creation, but you cannot decrease it. This is a critical limitation. Increasing partitions changes how message keys map to partitions — messages with the same key may end up in different partitions after the increase, breaking ordering guarantees for key-based routing. Plan your partition count thoughtfully at topic creation time. When in doubt, start with a slightly higher count than you think you need.
Topics and Partitions Across Brokers
In a multi-broker cluster, Kafka distributes partitions across brokers to balance load. This distribution happens automatically when you create a topic, based on the number of brokers available and the replication factor.
Partition Distribution Example
Cluster: 3 brokers (IDs: 1, 2, 3) Topic: user-activity Partitions: 6 Replication Factor: 3 Kafka assigns: Partition | Leader | Replicas P0 | B1 | B1, B2, B3 P1 | B2 | B2, B3, B1 P2 | B3 | B3, B1, B2 P3 | B1 | B1, B3, B2 P4 | B2 | B2, B1, B3 P5 | B3 | B3, B2, B1 Leader is spread: B1 leads P0,P3 | B2 leads P1,P4 | B3 leads P2,P5 Replicas always go to different brokers (never same broker as leader) Each broker stores all 6 partitions (as leader or replica)
Internal Kafka Topics
Kafka itself creates several internal topics that you normally do not interact with directly but should know about.
__consumer_offsets: This topic stores the current offset position for every consumer group and every partition. When a consumer commits its offset (marks a message as processed), Kafka writes that commit to this topic. It is the source of truth for where each consumer group is in its reading.
__transaction_state: Used by exactly-once semantics transactions to track the state of in-progress transactions. Only relevant when using Kafka's transactional API.
INTERNAL TOPICS (auto-created by Kafka): __consumer_offsets → Stores: consumer-group-A reading orders-P0 at offset 142 → Stores: consumer-group-B reading orders-P0 at offset 89 → Stores: consumer-group-A reading orders-P1 at offset 201 ...50 partitions of this internal topic by default __transaction_state → Tracks in-flight exactly-once transactions ...50 partitions of this internal topic by default
Compacted Topics: A Special Topic Type
Standard Kafka topics delete old messages based on time or size. Compacted topics use a different retention strategy — they keep only the most recent message for each unique message key. This makes them behave like a key-value store where you can look up the latest state for any key.
Think of a compacted topic like a hotel room assignment board. The board shows the current guest in each room. When a new guest checks in to room 101, you don't keep the old record of the previous guest in room 101 forever — you update the board to show the new occupant. A compacted topic does the same thing.
STANDARD TOPIC (time/size retention):
user-preferences: [u1:dark-mode] [u2:light-mode] [u1:light-mode] [u3:dark-mode]
After retention: all old messages eventually deleted, including history
COMPACTED TOPIC (log compaction):
user-preferences: [u1:dark-mode] [u2:light-mode] [u1:light-mode] [u3:dark-mode]
After compaction: [u2:light-mode] [u1:light-mode] [u3:dark-mode]
↑ old u1:dark-mode removed, only latest u1 state kept
Use case: User preferences, database change log capture (CDC),
application configuration, latest entity state.
Creating Topics: CLI and Configuration
Creating a topic requires specifying at minimum: the topic name, the number of partitions, and the replication factor.
# Create a topic with 6 partitions, replication factor 3: bin/kafka-topics.sh \ --create \ --topic user-activity \ --partitions 6 \ --replication-factor 3 \ --bootstrap-server localhost:9092 # Create a compacted topic: bin/kafka-topics.sh \ --create \ --topic user-preferences \ --partitions 3 \ --replication-factor 3 \ --config cleanup.policy=compact \ --bootstrap-server localhost:9092 # Describe a topic (shows partition distribution): bin/kafka-topics.sh \ --describe \ --topic user-activity \ --bootstrap-server localhost:9092 # Increase partition count (cannot decrease): bin/kafka-topics.sh \ --alter \ --topic user-activity \ --partitions 12 \ --bootstrap-server localhost:9092 # Delete a topic: bin/kafka-topics.sh \ --delete \ --topic user-activity \ --bootstrap-server localhost:9092
Auto-Creation of Topics
Kafka has a setting called auto.create.topics.enable in the broker configuration. When set to true (the default), Kafka automatically creates a topic with default settings when a producer writes to a non-existent topic. While convenient in development, this setting creates problems in production — a typo in a topic name creates an unwanted topic that accumulates data silently. Most production teams set this to false and manage topics explicitly.
Key Points
- A Kafka topic is a named, append-only log that stores a stream of related events. Messages in a topic cannot be modified once written.
- Topics are divided into partitions. Partitions enable parallelism, scalability, and distribution across multiple brokers.
- Within a partition, messages are strictly ordered by offset. Across partitions, there is no ordering guarantee.
- Partition count determines maximum consumer parallelism. You can have at most one active consumer per partition in a consumer group.
- Partition count can be increased but not decreased. Plan partition counts carefully at topic creation time.
- Compacted topics retain only the latest message per key, making them useful as key-value stores for current state.
- Disable auto.create.topics.enable in production to prevent accidental topic creation from typos.
