Azure Event Hub
IoT devices, website clickstreams, application logs, and financial transaction streams generate millions of events every second. Processing this volume of data with traditional messaging systems is impossible — they are not designed for such throughput. Azure Event Hub is a fully managed, real-time data ingestion service designed to receive, buffer, and process millions of events per second from any source.
What is Azure Event Hub?
Azure Event Hub is a big data streaming platform and event ingestion service. It acts as the "front door" for an event pipeline — collecting enormous volumes of events and making them available for downstream processing, analytics, and storage. Think of it as a massive, fast-moving conveyor belt that receives data from thousands of sources and delivers it to processors that analyze or store it.
Event Hub vs Service Bus
| Feature | Azure Event Hub | Azure Service Bus |
|---|---|---|
| Purpose | Big data streaming and event ingestion | Enterprise messaging and workflow decoupling |
| Volume | Millions of events per second | Thousands of messages per second |
| Message Size | Up to 1 MB (Standard), higher tiers support more | Up to 100 MB (Premium) |
| Consumer Model | Pull-based — consumers read at their own pace from a log | Push or pull — messages removed after consumption |
| Replay | Events can be replayed — consumers read from any offset | No replay — consumed messages are removed |
| Ordering | Per partition | FIFO with sessions |
| Best For | IoT telemetry, clickstream, log aggregation, analytics pipelines | Order processing, payment workflows, task queuing |
Event Hub Architecture
Producers (Event Sources) Event Hub Consumers
┌────────────────────────┐ ┌──────────────┐ ┌─────────────────────┐
│ IoT Device 1 (sensor) │──────► │ │──────►│ Azure Stream │
│ IoT Device 2 (sensor) │──────► │ Namespace │ │ Analytics (real-time│
│ Web App clickstream │──────► │ │──────►│ aggregation) │
│ Mobile App events │──────► │ Event Hub │ ├─────────────────────┤
│ Application logs │──────► │ (Partitions │──────►│ Azure Functions │
│ Payment events │──────► │ 1,2,3...) │ │ (process each event)│
└────────────────────────┘ │ │──────►├─────────────────────┤
│ Retention: │ │ Azure Data Lake │
│ 1-90 days │ │ (cold storage) │
└──────────────┘ └─────────────────────┘
Key Concepts
Partitions
An Event Hub is divided into partitions — parallel lanes for events. Each partition is an ordered, immutable sequence of events. Partitions allow multiple consumers to read from the Event Hub simultaneously — each consumer handles one or more partitions. More partitions mean higher parallelism and throughput.
Events with the same partition key always go to the same partition, preserving order for related events (e.g., all events from the same device go to the same partition in order).
Consumer Groups
A consumer group is an independent view of the entire Event Hub. Multiple consumer groups can read the same events independently — each group maintains its own offset (position in the stream). One consumer group might feed a real-time dashboard, another feeds a data warehouse, and a third drives alerts — all reading the same events without interfering with each other.
Offset and Checkpointing
Each event in a partition has an offset — a sequence number identifying its position. Consumers track their offset (called checkpointing) so they know which events have been processed. If a consumer crashes and restarts, it resumes from its last checkpoint without missing or re-processing events.
Event Retention
Unlike Service Bus (where messages are deleted after consumption), Event Hub retains events for a configurable period (1 to 90 days). This enables:
- Replaying events to reprocess with updated business logic
- Adding new downstream consumers that catch up on historical data
- Debugging by re-examining what events occurred
Event Hub Tiers
| Tier | Throughput Units / Processing Units | Max Retention | Key Features |
|---|---|---|---|
| Basic | Up to 20 TUs (1 TU = 1 MB/s ingress, 2 MB/s egress) | 1 day | 1 consumer group |
| Standard | Up to 20 TUs (auto-inflate available) | 7 days | 20 consumer groups, Event Capture |
| Premium | Processing Units (PUs), fully isolated | 90 days | Schema Registry, dynamic partitions, VNet |
| Dedicated | Capacity Units (CUs), dedicated hardware | 90 days | Highest throughput, private cluster |
Event Capture
Event Capture automatically saves streaming events to Azure Blob Storage or Azure Data Lake Storage in Apache Avro format at regular intervals. This bridges the gap between real-time streaming and batch analytics — the raw event data is preserved for long-term storage and historical analysis even while it is being processed in real time.
Event Capture Flow
IoT Devices ──► Event Hub ──► Real-time Processing (Stream Analytics)
│
└──► Event Capture ──► Azure Data Lake Storage
(Automatic, every few minutes)
(Avro format, organized by date/time)
Schema Registry
The Schema Registry in Event Hub Premium stores and manages message schemas (structures). When producers and consumers use the registry, they ensure that event data conforms to a defined format. This prevents schema mismatches that could cause consumers to crash when receiving unexpectedly structured events.
Common Event Hub Integration Patterns
- IoT Telemetry Pipeline: Millions of IoT sensors → Event Hub → Azure Stream Analytics (real-time aggregation) → Power BI dashboard + Azure Data Lake (storage).
- Clickstream Analytics: Website user clicks → Event Hub → Azure Functions (process each event) → Cosmos DB (user behavior store) + Azure Synapse (analytics).
- Log Aggregation: Application logs from 100 microservices → Event Hub → Azure Monitor Log Analytics workspace.
- Financial Fraud Detection: Transaction events → Event Hub → Azure Stream Analytics (detect patterns) → Alert if fraud pattern detected.
Key Takeaways
- Azure Event Hub is designed for massive-scale event ingestion — millions of events per second from any source.
- Partitions enable parallel processing; more partitions mean higher throughput and more concurrent consumers.
- Consumer groups allow multiple independent applications to read the same events simultaneously.
- Events are retained (not deleted after consumption) for 1 to 90 days, enabling replay and late-joining consumers.
- Event Capture automatically saves streaming data to Blob Storage or Data Lake for long-term batch analytics.
