Event Hub Core Concepts

Azure Event Hub is built around five foundational concepts: Events, Partitions, Consumer Groups, Offsets, and Throughput Units. Mastering these concepts is essential for designing, scaling, and troubleshooting any Event Hub solution.

1. Event

An event in Event Hub is any unit of data sent by a producer. Unlike Azure Event Grid events which have a defined schema with required fields, Event Hub events are schema-agnostic. An event is simply a byte array with optional properties.

Event Hub Event Structure

Component	Description	Example
Body	Raw byte array containing the event payload	JSON string, binary data, CSV row
Properties	User-defined key-value pairs attached to the event	{ "deviceId": "sensor-42", "region": "US-East" }
System Properties	Metadata set by Event Hub automatically on arrival	Offset, sequence number, enqueued time, partition key

Example Event Body (JSON format)

{
  "deviceId": "sensor-42",
  "timestamp": "2024-06-15T10:00:05Z",
  "temperature": 72.4,
  "humidity": 55.2,
  "status": "normal"
}

Event Hub does not validate or parse the body. The producer and consumer must agree on the format independently.

Maximum Event Size

A single event can be up to 1 MB in size. A batch of events (EventDataBatch) is limited to the maximum batch size for the namespace, which is also 1 MB by default. Batching multiple small events into a single send operation improves throughput efficiency.

2. Partition

A partition is an ordered, immutable sequence of events stored in an Event Hub. Events arrive at the Event Hub and are assigned to a partition. Within a partition, events maintain their arrival order. Events across different partitions have no guaranteed order relative to each other.

Partition Concept Diagram

Event Hub: "telemetry" (4 partitions)

Partition 0:  [Offset 0][Offset 1][Offset 2][Offset 3][Offset 4]  --> newest events
              (earliest)                                             (latest)

Partition 1:  [Offset 0][Offset 1][Offset 2][Offset 3]

Partition 2:  [Offset 0][Offset 1][Offset 2][Offset 3][Offset 4][Offset 5]

Partition 3:  [Offset 0][Offset 1][Offset 2]

Each partition is like a separate ordered log.
Events written to Partition 0 have no ordering relationship with events in Partition 2.

How Events Are Assigned to Partitions

Three assignment strategies are available:

Strategy	How It Works	When to Use
Round-robin (default)	Events distributed evenly across all partitions automatically	Maximum throughput with no ordering requirement
Partition Key	Events with the same key always go to the same partition	Events that must be processed in order per entity (e.g., per device)
Explicit Partition ID	Producer specifies exactly which partition to use	Advanced scenarios requiring strict partition control

Partition Key Example

Producer publishes sensor readings with partitionKey = deviceId

Device "sensor-01" always goes to Partition 0
Device "sensor-02" always goes to Partition 1
Device "sensor-03" always goes to Partition 2

Consumer reading Partition 0 receives all events from sensor-01 in order.
Events from sensor-01 are never mixed with events from sensor-02.

Partition Count – Key Design Decision

Partition count is set at Event Hub creation time and cannot be decreased. In Premium and Dedicated tiers, it can be increased. The number of partitions directly determines the maximum parallelism of consumers.

Partitions	Maximum Parallel Consumers	Maximum Throughput (approx.)
4	4 concurrent reader instances per consumer group	Good for moderate workloads
16	16 concurrent reader instances per consumer group	High throughput scenarios
32	32 concurrent reader instances per consumer group	Very high throughput scenarios
100+	100+ concurrent reader instances	Premium or Dedicated tier; extreme scale

A consumer instance can read from multiple partitions, but one partition can only be read by one consumer instance at a time within the same consumer group. Setting partitions too low limits maximum throughput. Setting too high wastes resources and increases cost unnecessarily.

3. Consumer Group

A consumer group is a logical view of an entire Event Hub stream. Each consumer group reads the complete event stream independently. Multiple consumer groups read the same events simultaneously without interfering with each other.

Consumer Group Diagram

Event Hub: "telemetry"
All partitions contain the same stream of events.

Consumer Group "analytics":
  StreamAnalytics app reads all 4 partitions
  Processes for anomaly detection
  Current position: Partition0=Offset12, Partition1=Offset9, Partition2=Offset15, Partition3=Offset11

Consumer Group "archive":
  Azure Function reads all 4 partitions
  Writes events to cold storage
  Current position: Partition0=Offset12, Partition1=Offset9, Partition2=Offset15, Partition3=Offset11

Consumer Group "alerts":
  Custom Java app reads all 4 partitions
  Sends threshold alerts
  Current position: Partition0=Offset3, Partition1=Offset2 (this group is behind)

Note: "archive" and "analytics" are at the same position.
"alerts" is behind but does NOT slow down the other two groups.

The $Default consumer group exists in every Event Hub. Additional consumer groups must be created manually in the Standard, Premium, and Dedicated tiers. Basic tier supports only one consumer group ($Default).

Consumer Group Limits

Tier	Maximum Consumer Groups per Event Hub
Basic	1 ($Default only)
Standard	20
Premium	100
Dedicated	Unlimited

4. Offset

An offset is the position of an event within a partition. Every event stored in a partition has a unique offset — a sequential number starting from 0. Consumers track their current offset to know which events they have already processed and which are still unread.

Partition 0 events:

Offset:     0      1      2      3      4      5      6
Events:  [E-A]  [E-B]  [E-C]  [E-D]  [E-E]  [E-F]  [E-G]

Consumer processed up to Offset 3 (E-D).
Consumer resumes from Offset 4 (E-E) after restart.

Reading From Different Positions

Starting Position	Description	Use Case
Earliest (Beginning)	Read from the very first retained event in the partition	Initial data load; replay all historical events
Latest (End)	Read only new events arriving after the consumer starts	Real-time processing; no interest in historical data
Specific Offset	Read from a specific offset number	Resuming after a crash at a known position
Specific Timestamp	Read events from a specific date and time	Reprocessing events after a bug fix for a specific time window

Checkpointing

Checkpointing is the process of saving the current offset to a durable storage location. When a consumer restarts after a failure, it reads the saved checkpoint to resume processing from where it left off, rather than starting over.

Consumer processes events:
  Event at Offset 10 --> processed --> checkpoint saved: Offset 10
  Event at Offset 11 --> processed --> checkpoint saved: Offset 11
  Event at Offset 12 --> processed --> checkpoint saved: Offset 12
  Event at Offset 13 --> Consumer crashes!

Consumer restarts:
  Reads checkpoint --> last saved = Offset 12
  Resumes from Offset 13

No events are lost. Events 10, 11, 12 are not reprocessed.

The Event Processor Host (discussed in the Sending and Receiving Events topic) handles checkpointing automatically using Azure Blob Storage.

5. Throughput Units (TUs)

A throughput unit is the purchased capacity unit for an Event Hub namespace in the Standard tier. Each throughput unit provides:

Direction	Capacity per Throughput Unit
Ingress (incoming data)	1 MB/second OR 1,000 events/second (whichever is hit first)
Egress (outgoing data to consumers)	2 MB/second OR 4,096 events/second

A namespace can have 1 to 40 throughput units in Standard tier. If the ingress or egress limits are exceeded, Event Hub throttles requests and returns ServerBusy errors. Enabling Auto-Inflate automatically increases throughput units when limits are approached.

Throughput Unit Calculation Example

Scenario:
  500 IoT devices each sending 5 events/second
  Each event = 500 bytes

Ingress rate calculation:
  Total events/second = 500 devices * 5 events = 2,500 events/second
  Total data/second   = 2,500 * 500 bytes = 1,250,000 bytes = 1.25 MB/second

One TU supports 1 MB/second (ingress).
1.25 MB/second requires at least 2 TUs.

Event Hub Terminology Summary

Term	Simple Explanation
Event	A single piece of data (a record) sent to Event Hub
Partition	An ordered log; events in one partition stay in order
Consumer Group	A named view of the stream; each group reads independently
Offset	The position number of an event inside a partition
Checkpoint	A saved offset so consumers can resume after a restart
Throughput Unit	A unit of capacity controlling how much data Event Hub handles
Sequence Number	An automatically assigned number identifying each event's position in a partition
Partition Key	A string value used to route events to the same partition

Summary

Events are the data units. Partitions organize events into ordered logs. Consumer groups provide isolated reading for multiple independent consumers. Offsets track position within a partition. Throughput units control ingestion capacity. These five concepts define the architecture of every Azure Event Hub solution.

Previous lessons

Back to courses

Next lessons