Event Hub Core Concepts
Azure Event Hub is built around five foundational concepts: Events, Partitions, Consumer Groups, Offsets, and Throughput Units. Mastering these concepts is essential for designing, scaling, and troubleshooting any Event Hub solution.
1. Event
An event in Event Hub is any unit of data sent by a producer. Unlike Azure Event Grid events which have a defined schema with required fields, Event Hub events are schema-agnostic. An event is simply a byte array with optional properties.
Event Hub Event Structure
| Component | Description | Example |
|---|---|---|
| Body | Raw byte array containing the event payload | JSON string, binary data, CSV row |
| Properties | User-defined key-value pairs attached to the event | { "deviceId": "sensor-42", "region": "US-East" } |
| System Properties | Metadata set by Event Hub automatically on arrival | Offset, sequence number, enqueued time, partition key |
Example Event Body (JSON format)
{
"deviceId": "sensor-42",
"timestamp": "2024-06-15T10:00:05Z",
"temperature": 72.4,
"humidity": 55.2,
"status": "normal"
}
Event Hub does not validate or parse the body. The producer and consumer must agree on the format independently.
Maximum Event Size
A single event can be up to 1 MB in size. A batch of events (EventDataBatch) is limited to the maximum batch size for the namespace, which is also 1 MB by default. Batching multiple small events into a single send operation improves throughput efficiency.
2. Partition
A partition is an ordered, immutable sequence of events stored in an Event Hub. Events arrive at the Event Hub and are assigned to a partition. Within a partition, events maintain their arrival order. Events across different partitions have no guaranteed order relative to each other.
Partition Concept Diagram
Event Hub: "telemetry" (4 partitions)
Partition 0: [Offset 0][Offset 1][Offset 2][Offset 3][Offset 4] --> newest events
(earliest) (latest)
Partition 1: [Offset 0][Offset 1][Offset 2][Offset 3]
Partition 2: [Offset 0][Offset 1][Offset 2][Offset 3][Offset 4][Offset 5]
Partition 3: [Offset 0][Offset 1][Offset 2]
Each partition is like a separate ordered log.
Events written to Partition 0 have no ordering relationship with events in Partition 2.
How Events Are Assigned to Partitions
Three assignment strategies are available:
| Strategy | How It Works | When to Use |
|---|---|---|
| Round-robin (default) | Events distributed evenly across all partitions automatically | Maximum throughput with no ordering requirement |
| Partition Key | Events with the same key always go to the same partition | Events that must be processed in order per entity (e.g., per device) |
| Explicit Partition ID | Producer specifies exactly which partition to use | Advanced scenarios requiring strict partition control |
Partition Key Example
Producer publishes sensor readings with partitionKey = deviceId Device "sensor-01" always goes to Partition 0 Device "sensor-02" always goes to Partition 1 Device "sensor-03" always goes to Partition 2 Consumer reading Partition 0 receives all events from sensor-01 in order. Events from sensor-01 are never mixed with events from sensor-02.
Partition Count – Key Design Decision
Partition count is set at Event Hub creation time and cannot be decreased. In Premium and Dedicated tiers, it can be increased. The number of partitions directly determines the maximum parallelism of consumers.
| Partitions | Maximum Parallel Consumers | Maximum Throughput (approx.) |
|---|---|---|
| 4 | 4 concurrent reader instances per consumer group | Good for moderate workloads |
| 16 | 16 concurrent reader instances per consumer group | High throughput scenarios |
| 32 | 32 concurrent reader instances per consumer group | Very high throughput scenarios |
| 100+ | 100+ concurrent reader instances | Premium or Dedicated tier; extreme scale |
A consumer instance can read from multiple partitions, but one partition can only be read by one consumer instance at a time within the same consumer group. Setting partitions too low limits maximum throughput. Setting too high wastes resources and increases cost unnecessarily.
3. Consumer Group
A consumer group is a logical view of an entire Event Hub stream. Each consumer group reads the complete event stream independently. Multiple consumer groups read the same events simultaneously without interfering with each other.
Consumer Group Diagram
Event Hub: "telemetry" All partitions contain the same stream of events. Consumer Group "analytics": StreamAnalytics app reads all 4 partitions Processes for anomaly detection Current position: Partition0=Offset12, Partition1=Offset9, Partition2=Offset15, Partition3=Offset11 Consumer Group "archive": Azure Function reads all 4 partitions Writes events to cold storage Current position: Partition0=Offset12, Partition1=Offset9, Partition2=Offset15, Partition3=Offset11 Consumer Group "alerts": Custom Java app reads all 4 partitions Sends threshold alerts Current position: Partition0=Offset3, Partition1=Offset2 (this group is behind) Note: "archive" and "analytics" are at the same position. "alerts" is behind but does NOT slow down the other two groups.
The $Default consumer group exists in every Event Hub. Additional consumer groups must be created manually in the Standard, Premium, and Dedicated tiers. Basic tier supports only one consumer group ($Default).
Consumer Group Limits
| Tier | Maximum Consumer Groups per Event Hub |
|---|---|
| Basic | 1 ($Default only) |
| Standard | 20 |
| Premium | 100 |
| Dedicated | Unlimited |
4. Offset
An offset is the position of an event within a partition. Every event stored in a partition has a unique offset — a sequential number starting from 0. Consumers track their current offset to know which events they have already processed and which are still unread.
Partition 0 events: Offset: 0 1 2 3 4 5 6 Events: [E-A] [E-B] [E-C] [E-D] [E-E] [E-F] [E-G] Consumer processed up to Offset 3 (E-D). Consumer resumes from Offset 4 (E-E) after restart.
Reading From Different Positions
| Starting Position | Description | Use Case |
|---|---|---|
| Earliest (Beginning) | Read from the very first retained event in the partition | Initial data load; replay all historical events |
| Latest (End) | Read only new events arriving after the consumer starts | Real-time processing; no interest in historical data |
| Specific Offset | Read from a specific offset number | Resuming after a crash at a known position |
| Specific Timestamp | Read events from a specific date and time | Reprocessing events after a bug fix for a specific time window |
Checkpointing
Checkpointing is the process of saving the current offset to a durable storage location. When a consumer restarts after a failure, it reads the saved checkpoint to resume processing from where it left off, rather than starting over.
Consumer processes events: Event at Offset 10 --> processed --> checkpoint saved: Offset 10 Event at Offset 11 --> processed --> checkpoint saved: Offset 11 Event at Offset 12 --> processed --> checkpoint saved: Offset 12 Event at Offset 13 --> Consumer crashes! Consumer restarts: Reads checkpoint --> last saved = Offset 12 Resumes from Offset 13 No events are lost. Events 10, 11, 12 are not reprocessed.
The Event Processor Host (discussed in the Sending and Receiving Events topic) handles checkpointing automatically using Azure Blob Storage.
5. Throughput Units (TUs)
A throughput unit is the purchased capacity unit for an Event Hub namespace in the Standard tier. Each throughput unit provides:
| Direction | Capacity per Throughput Unit |
|---|---|
| Ingress (incoming data) | 1 MB/second OR 1,000 events/second (whichever is hit first) |
| Egress (outgoing data to consumers) | 2 MB/second OR 4,096 events/second |
A namespace can have 1 to 40 throughput units in Standard tier. If the ingress or egress limits are exceeded, Event Hub throttles requests and returns ServerBusy errors. Enabling Auto-Inflate automatically increases throughput units when limits are approached.
Throughput Unit Calculation Example
Scenario: 500 IoT devices each sending 5 events/second Each event = 500 bytes Ingress rate calculation: Total events/second = 500 devices * 5 events = 2,500 events/second Total data/second = 2,500 * 500 bytes = 1,250,000 bytes = 1.25 MB/second One TU supports 1 MB/second (ingress). 1.25 MB/second requires at least 2 TUs.
Event Hub Terminology Summary
| Term | Simple Explanation |
|---|---|
| Event | A single piece of data (a record) sent to Event Hub |
| Partition | An ordered log; events in one partition stay in order |
| Consumer Group | A named view of the stream; each group reads independently |
| Offset | The position number of an event inside a partition |
| Checkpoint | A saved offset so consumers can resume after a restart |
| Throughput Unit | A unit of capacity controlling how much data Event Hub handles |
| Sequence Number | An automatically assigned number identifying each event's position in a partition |
| Partition Key | A string value used to route events to the same partition |
Summary
Events are the data units. Partitions organize events into ordered logs. Consumer groups provide isolated reading for multiple independent consumers. Offsets track position within a partition. Throughput units control ingestion capacity. These five concepts define the architecture of every Azure Event Hub solution.
