Understanding Topics and Partitions in Apache Kafka

Topics and partitions are the two most fundamental building blocks of Kafka's data storage. Every message in Kafka lives inside a partition, and every partition belongs to a topic. Getting these two concepts crystal clear gives you a solid foundation for understanding how Kafka stores data, scales throughput, and delivers messages to multiple consumers simultaneously.

What Is a Kafka Topic

A Kafka topic is a named category or feed to which records are published. Think of a topic as a folder in a filing cabinet. You label each folder by the type of documents it holds — "invoices," "employee-records," "project-reports." In Kafka, you create topics like "user-signups," "payment-transactions," "sensor-readings," or "page-views."

Topics are the logical containers for your event streams. Producers decide which topic to write to. Consumers decide which topics to read from. The naming of topics matters — good topic names communicate exactly what type of event they contain.

Topics Are Append-Only

Once a message is written to a topic, it cannot be modified or deleted (until the retention period expires and Kafka automatically removes it). Messages always append to the end. This append-only behavior is what makes Kafka fast — sequential disk writes are much faster than random writes, and they enable consumers to read sequentially rather than jumping around the disk.

TOPIC: payment-transactions
──────────────────────────────────────────────────────────────────
Time →→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→

[pay_001] [pay_002] [pay_003] [pay_004] [pay_005]  [new →]
  $50       $120      $8.99    $4,500     $23.00

Offset 0  Offset 1  Offset 2  Offset 3  Offset 4

Rules: No updates. No deletes (until retention). Only appends.
Consumers read left-to-right. Order is preserved.
──────────────────────────────────────────────────────────────────

Topic Naming Best Practices

Good topic naming prevents confusion as your Kafka cluster grows. Several widely-used naming conventions exist. The key is picking one convention and staying consistent.

Use dot-separated or hyphen-separated names: Examples: orders.created, payments.processed, or user-activity. Avoid spaces and special characters beyond dots, hyphens, and underscores.

Include the domain and event type: A topic named orders.created is far clearer than orders or events. It tells you what domain it belongs to (orders) and what event it represents (created).

Avoid overly generic names: Topics named events, data, or messages become dumping grounds for unrelated data, making them impossible to consume cleanly.

Consider environment prefixes: Large organizations often prefix topic names with the environment: prod.orders.created, staging.orders.created, dev.orders.created. This prevents development data from contaminating production consumers.

What Is a Kafka Partition

A partition is the physical unit that Kafka actually stores data in. Every topic is divided into one or more partitions. Partitions are what enable Kafka to scale — they allow data to be distributed across multiple brokers and allow multiple consumers to read in parallel.

The Bookshelf Analogy

Imagine a library with thousands of books on a single subject — say, history. Storing all these books on one shelf would create a bottleneck: only one librarian can find and retrieve books from that shelf at a time. Divide the books across 10 shelves in 10 different rooms, and 10 librarians can retrieve books simultaneously. Kafka partitions work exactly this way.

ONE PARTITION (slow, single-threaded):
Topic: orders
  Partition 0: [o1][o2][o3][o4][o5][o6][o7][o8][o9][o10]
               ← one consumer reads sequentially, one at a time

THREE PARTITIONS (fast, parallel):
Topic: orders (3 partitions)
  Partition 0: [o1][o4][o7][o10] ← Consumer A reads this
  Partition 1: [o2][o5][o8]      ← Consumer B reads this
  Partition 2: [o3][o6][o9]      ← Consumer C reads this
               3 consumers work in parallel = 3x throughput

Each Partition Is an Independent Ordered Log

Within a single partition, messages are strictly ordered. Message at offset 0 came before offset 1, offset 1 before offset 2, and so on. This order is guaranteed and permanent.

Across partitions, there is no guaranteed order. A message at offset 5 in partition 0 might have arrived before or after offset 2 in partition 1. If global ordering matters to you, use a single partition — but you sacrifice parallelism. If ordering only matters within a group of related messages, use message keys to route related messages to the same partition (covered in Topic 9).

How Partitions Are Stored on Disk

Each partition on a broker is stored as a directory on disk. Inside the directory, Kafka stores messages in segment files. Each segment has a maximum size (default 1 GB) or maximum age (default 7 days). When a segment fills up, Kafka creates a new one. Old segments get deleted when the retention policy expires.

DISK LAYOUT FOR orders TOPIC (3 partitions on 3 brokers):

BROKER 1:                    BROKER 2:                    BROKER 3:
/kafka-data/                 /kafka-data/                 /kafka-data/
  orders-0/                    orders-1/                    orders-2/
    00000000000000000000.log     00000000000000000000.log     00000000000000000000.log
    00000000000000000000.index   00000000000000000000.index   00000000000000000000.index
    00000000000001048576.log     ...                          ...
    00000000000001048576.index

File name = starting offset of that segment.
New segment created when current segment reaches max size.
Old segments deleted when retention period expires.

How Many Partitions Should You Create

Choosing the right number of partitions per topic is one of the most important Kafka design decisions. Too few partitions limit throughput and parallelism. Too many partitions add overhead and slow down leader elections during failures.

Factors That Determine Partition Count

Target throughput: Estimate how many megabytes per second you need to produce and consume. A single partition on modern hardware can typically handle 10–50 MB/s. Divide your target throughput by the per-partition throughput to get an approximate partition count.

Number of consumers: You can only have as many active consumers in a consumer group as there are partitions. If you want 10 consumers in a group reading in parallel, you need at least 10 partitions.

Number of brokers: For best performance, distribute partitions evenly across brokers. A 3-broker cluster with a 9-partition topic gives 3 partitions per broker — perfectly balanced.

Retention and disk usage: More partitions mean more small files on disk. With millions of partitions across a cluster, the file system overhead becomes significant. Do not create partitions you do not need.

PARTITION PLANNING EXAMPLE

Requirements:
  - Target produce throughput: 100 MB/s
  - Per-partition capacity: 20 MB/s
  - Desired consumer parallelism: 10 consumers
  - Number of brokers: 5

Calculation:
  Partitions for throughput: 100 / 20 = 5 partitions minimum
  Partitions for consumers: 10 (to match consumer count)
  Even distribution across 5 brokers: 10 partitions = 2 per broker

Decision: Create 10 partitions for this topic.

The Challenge of Changing Partition Count

You can increase the number of partitions for a topic after creation, but you cannot decrease it. This is a critical limitation. Increasing partitions changes how message keys map to partitions — messages with the same key may end up in different partitions after the increase, breaking ordering guarantees for key-based routing. Plan your partition count thoughtfully at topic creation time. When in doubt, start with a slightly higher count than you think you need.

Topics and Partitions Across Brokers

In a multi-broker cluster, Kafka distributes partitions across brokers to balance load. This distribution happens automatically when you create a topic, based on the number of brokers available and the replication factor.

Partition Distribution Example

Cluster: 3 brokers (IDs: 1, 2, 3)
Topic: user-activity
Partitions: 6
Replication Factor: 3

Kafka assigns:

Partition | Leader | Replicas
  P0      | B1     | B1, B2, B3
  P1      | B2     | B2, B3, B1
  P2      | B3     | B3, B1, B2
  P3      | B1     | B1, B3, B2
  P4      | B2     | B2, B1, B3
  P5      | B3     | B3, B2, B1

Leader is spread: B1 leads P0,P3 | B2 leads P1,P4 | B3 leads P2,P5
Replicas always go to different brokers (never same broker as leader)
Each broker stores all 6 partitions (as leader or replica)

Internal Kafka Topics

Kafka itself creates several internal topics that you normally do not interact with directly but should know about.

__consumer_offsets: This topic stores the current offset position for every consumer group and every partition. When a consumer commits its offset (marks a message as processed), Kafka writes that commit to this topic. It is the source of truth for where each consumer group is in its reading.

__transaction_state: Used by exactly-once semantics transactions to track the state of in-progress transactions. Only relevant when using Kafka's transactional API.

INTERNAL TOPICS (auto-created by Kafka):

__consumer_offsets
  → Stores: consumer-group-A reading orders-P0 at offset 142
  → Stores: consumer-group-B reading orders-P0 at offset 89
  → Stores: consumer-group-A reading orders-P1 at offset 201
  ...50 partitions of this internal topic by default

__transaction_state
  → Tracks in-flight exactly-once transactions
  ...50 partitions of this internal topic by default

Compacted Topics: A Special Topic Type

Standard Kafka topics delete old messages based on time or size. Compacted topics use a different retention strategy — they keep only the most recent message for each unique message key. This makes them behave like a key-value store where you can look up the latest state for any key.

Think of a compacted topic like a hotel room assignment board. The board shows the current guest in each room. When a new guest checks in to room 101, you don't keep the old record of the previous guest in room 101 forever — you update the board to show the new occupant. A compacted topic does the same thing.

STANDARD TOPIC (time/size retention):
user-preferences: [u1:dark-mode] [u2:light-mode] [u1:light-mode] [u3:dark-mode]
After retention: all old messages eventually deleted, including history

COMPACTED TOPIC (log compaction):
user-preferences: [u1:dark-mode] [u2:light-mode] [u1:light-mode] [u3:dark-mode]
After compaction: [u2:light-mode] [u1:light-mode] [u3:dark-mode]
               ↑ old u1:dark-mode removed, only latest u1 state kept

Use case: User preferences, database change log capture (CDC),
          application configuration, latest entity state.

Creating Topics: CLI and Configuration

Creating a topic requires specifying at minimum: the topic name, the number of partitions, and the replication factor.

# Create a topic with 6 partitions, replication factor 3:
bin/kafka-topics.sh \
  --create \
  --topic user-activity \
  --partitions 6 \
  --replication-factor 3 \
  --bootstrap-server localhost:9092

# Create a compacted topic:
bin/kafka-topics.sh \
  --create \
  --topic user-preferences \
  --partitions 3 \
  --replication-factor 3 \
  --config cleanup.policy=compact \
  --bootstrap-server localhost:9092

# Describe a topic (shows partition distribution):
bin/kafka-topics.sh \
  --describe \
  --topic user-activity \
  --bootstrap-server localhost:9092

# Increase partition count (cannot decrease):
bin/kafka-topics.sh \
  --alter \
  --topic user-activity \
  --partitions 12 \
  --bootstrap-server localhost:9092

# Delete a topic:
bin/kafka-topics.sh \
  --delete \
  --topic user-activity \
  --bootstrap-server localhost:9092

Auto-Creation of Topics

Kafka has a setting called auto.create.topics.enable in the broker configuration. When set to true (the default), Kafka automatically creates a topic with default settings when a producer writes to a non-existent topic. While convenient in development, this setting creates problems in production — a typo in a topic name creates an unwanted topic that accumulates data silently. Most production teams set this to false and manage topics explicitly.

Key Points

  • A Kafka topic is a named, append-only log that stores a stream of related events. Messages in a topic cannot be modified once written.
  • Topics are divided into partitions. Partitions enable parallelism, scalability, and distribution across multiple brokers.
  • Within a partition, messages are strictly ordered by offset. Across partitions, there is no ordering guarantee.
  • Partition count determines maximum consumer parallelism. You can have at most one active consumer per partition in a consumer group.
  • Partition count can be increased but not decreased. Plan partition counts carefully at topic creation time.
  • Compacted topics retain only the latest message per key, making them useful as key-value stores for current state.
  • Disable auto.create.topics.enable in production to prevent accidental topic creation from typos.

Leave a Comment