Event Hub Core Concepts

Azure Event Hub is built around five foundational concepts: Events, Partitions, Consumer Groups, Offsets, and Throughput Units. Mastering these concepts is essential for designing, scaling, and troubleshooting any Event Hub solution.

1. Event

An event in Event Hub is any unit of data sent by a producer. Unlike Azure Event Grid events which have a defined schema with required fields, Event Hub events are schema-agnostic. An event is simply a byte array with optional properties.

Event Hub Event Structure

ComponentDescriptionExample
BodyRaw byte array containing the event payloadJSON string, binary data, CSV row
PropertiesUser-defined key-value pairs attached to the event{ "deviceId": "sensor-42", "region": "US-East" }
System PropertiesMetadata set by Event Hub automatically on arrivalOffset, sequence number, enqueued time, partition key

Example Event Body (JSON format)

{
  "deviceId": "sensor-42",
  "timestamp": "2024-06-15T10:00:05Z",
  "temperature": 72.4,
  "humidity": 55.2,
  "status": "normal"
}

Event Hub does not validate or parse the body. The producer and consumer must agree on the format independently.

Maximum Event Size

A single event can be up to 1 MB in size. A batch of events (EventDataBatch) is limited to the maximum batch size for the namespace, which is also 1 MB by default. Batching multiple small events into a single send operation improves throughput efficiency.

2. Partition

A partition is an ordered, immutable sequence of events stored in an Event Hub. Events arrive at the Event Hub and are assigned to a partition. Within a partition, events maintain their arrival order. Events across different partitions have no guaranteed order relative to each other.

Partition Concept Diagram

Event Hub: "telemetry" (4 partitions)

Partition 0:  [Offset 0][Offset 1][Offset 2][Offset 3][Offset 4]  --> newest events
              (earliest)                                             (latest)

Partition 1:  [Offset 0][Offset 1][Offset 2][Offset 3]

Partition 2:  [Offset 0][Offset 1][Offset 2][Offset 3][Offset 4][Offset 5]

Partition 3:  [Offset 0][Offset 1][Offset 2]

Each partition is like a separate ordered log.
Events written to Partition 0 have no ordering relationship with events in Partition 2.

How Events Are Assigned to Partitions

Three assignment strategies are available:

StrategyHow It WorksWhen to Use
Round-robin (default)Events distributed evenly across all partitions automaticallyMaximum throughput with no ordering requirement
Partition KeyEvents with the same key always go to the same partitionEvents that must be processed in order per entity (e.g., per device)
Explicit Partition IDProducer specifies exactly which partition to useAdvanced scenarios requiring strict partition control
Partition Key Example
Producer publishes sensor readings with partitionKey = deviceId

Device "sensor-01" always goes to Partition 0
Device "sensor-02" always goes to Partition 1
Device "sensor-03" always goes to Partition 2

Consumer reading Partition 0 receives all events from sensor-01 in order.
Events from sensor-01 are never mixed with events from sensor-02.

Partition Count – Key Design Decision

Partition count is set at Event Hub creation time and cannot be decreased. In Premium and Dedicated tiers, it can be increased. The number of partitions directly determines the maximum parallelism of consumers.

PartitionsMaximum Parallel ConsumersMaximum Throughput (approx.)
44 concurrent reader instances per consumer groupGood for moderate workloads
1616 concurrent reader instances per consumer groupHigh throughput scenarios
3232 concurrent reader instances per consumer groupVery high throughput scenarios
100+100+ concurrent reader instancesPremium or Dedicated tier; extreme scale

A consumer instance can read from multiple partitions, but one partition can only be read by one consumer instance at a time within the same consumer group. Setting partitions too low limits maximum throughput. Setting too high wastes resources and increases cost unnecessarily.

3. Consumer Group

A consumer group is a logical view of an entire Event Hub stream. Each consumer group reads the complete event stream independently. Multiple consumer groups read the same events simultaneously without interfering with each other.

Consumer Group Diagram

Event Hub: "telemetry"
All partitions contain the same stream of events.

Consumer Group "analytics":
  StreamAnalytics app reads all 4 partitions
  Processes for anomaly detection
  Current position: Partition0=Offset12, Partition1=Offset9, Partition2=Offset15, Partition3=Offset11

Consumer Group "archive":
  Azure Function reads all 4 partitions
  Writes events to cold storage
  Current position: Partition0=Offset12, Partition1=Offset9, Partition2=Offset15, Partition3=Offset11

Consumer Group "alerts":
  Custom Java app reads all 4 partitions
  Sends threshold alerts
  Current position: Partition0=Offset3, Partition1=Offset2 (this group is behind)

Note: "archive" and "analytics" are at the same position.
"alerts" is behind but does NOT slow down the other two groups.

The $Default consumer group exists in every Event Hub. Additional consumer groups must be created manually in the Standard, Premium, and Dedicated tiers. Basic tier supports only one consumer group ($Default).

Consumer Group Limits

TierMaximum Consumer Groups per Event Hub
Basic1 ($Default only)
Standard20
Premium100
DedicatedUnlimited

4. Offset

An offset is the position of an event within a partition. Every event stored in a partition has a unique offset — a sequential number starting from 0. Consumers track their current offset to know which events they have already processed and which are still unread.

Partition 0 events:

Offset:     0      1      2      3      4      5      6
Events:  [E-A]  [E-B]  [E-C]  [E-D]  [E-E]  [E-F]  [E-G]

Consumer processed up to Offset 3 (E-D).
Consumer resumes from Offset 4 (E-E) after restart.

Reading From Different Positions

Starting PositionDescriptionUse Case
Earliest (Beginning)Read from the very first retained event in the partitionInitial data load; replay all historical events
Latest (End)Read only new events arriving after the consumer startsReal-time processing; no interest in historical data
Specific OffsetRead from a specific offset numberResuming after a crash at a known position
Specific TimestampRead events from a specific date and timeReprocessing events after a bug fix for a specific time window

Checkpointing

Checkpointing is the process of saving the current offset to a durable storage location. When a consumer restarts after a failure, it reads the saved checkpoint to resume processing from where it left off, rather than starting over.

Consumer processes events:
  Event at Offset 10 --> processed --> checkpoint saved: Offset 10
  Event at Offset 11 --> processed --> checkpoint saved: Offset 11
  Event at Offset 12 --> processed --> checkpoint saved: Offset 12
  Event at Offset 13 --> Consumer crashes!

Consumer restarts:
  Reads checkpoint --> last saved = Offset 12
  Resumes from Offset 13

No events are lost. Events 10, 11, 12 are not reprocessed.

The Event Processor Host (discussed in the Sending and Receiving Events topic) handles checkpointing automatically using Azure Blob Storage.

5. Throughput Units (TUs)

A throughput unit is the purchased capacity unit for an Event Hub namespace in the Standard tier. Each throughput unit provides:

DirectionCapacity per Throughput Unit
Ingress (incoming data)1 MB/second OR 1,000 events/second (whichever is hit first)
Egress (outgoing data to consumers)2 MB/second OR 4,096 events/second

A namespace can have 1 to 40 throughput units in Standard tier. If the ingress or egress limits are exceeded, Event Hub throttles requests and returns ServerBusy errors. Enabling Auto-Inflate automatically increases throughput units when limits are approached.

Throughput Unit Calculation Example

Scenario:
  500 IoT devices each sending 5 events/second
  Each event = 500 bytes

Ingress rate calculation:
  Total events/second = 500 devices * 5 events = 2,500 events/second
  Total data/second   = 2,500 * 500 bytes = 1,250,000 bytes = 1.25 MB/second

One TU supports 1 MB/second (ingress).
1.25 MB/second requires at least 2 TUs.

Event Hub Terminology Summary

TermSimple Explanation
EventA single piece of data (a record) sent to Event Hub
PartitionAn ordered log; events in one partition stay in order
Consumer GroupA named view of the stream; each group reads independently
OffsetThe position number of an event inside a partition
CheckpointA saved offset so consumers can resume after a restart
Throughput UnitA unit of capacity controlling how much data Event Hub handles
Sequence NumberAn automatically assigned number identifying each event's position in a partition
Partition KeyA string value used to route events to the same partition

Summary

Events are the data units. Partitions organize events into ordered logs. Consumer groups provide isolated reading for multiple independent consumers. Offsets track position within a partition. Throughput units control ingestion capacity. These five concepts define the architecture of every Azure Event Hub solution.

Leave a Comment