Azure Event Hub

IoT devices, website clickstreams, application logs, and financial transaction streams generate millions of events every second. Processing this volume of data with traditional messaging systems is impossible — they are not designed for such throughput. Azure Event Hub is a fully managed, real-time data ingestion service designed to receive, buffer, and process millions of events per second from any source.

What is Azure Event Hub?

Azure Event Hub is a big data streaming platform and event ingestion service. It acts as the "front door" for an event pipeline — collecting enormous volumes of events and making them available for downstream processing, analytics, and storage. Think of it as a massive, fast-moving conveyor belt that receives data from thousands of sources and delivers it to processors that analyze or store it.

Event Hub vs Service Bus

FeatureAzure Event HubAzure Service Bus
PurposeBig data streaming and event ingestionEnterprise messaging and workflow decoupling
VolumeMillions of events per secondThousands of messages per second
Message SizeUp to 1 MB (Standard), higher tiers support moreUp to 100 MB (Premium)
Consumer ModelPull-based — consumers read at their own pace from a logPush or pull — messages removed after consumption
ReplayEvents can be replayed — consumers read from any offsetNo replay — consumed messages are removed
OrderingPer partitionFIFO with sessions
Best ForIoT telemetry, clickstream, log aggregation, analytics pipelinesOrder processing, payment workflows, task queuing

Event Hub Architecture

  Producers (Event Sources)            Event Hub              Consumers
  ┌────────────────────────┐          ┌──────────────┐       ┌─────────────────────┐
  │ IoT Device 1 (sensor)  │──────►   │              │──────►│ Azure Stream        │
  │ IoT Device 2 (sensor)  │──────►   │  Namespace   │       │ Analytics (real-time│
  │ Web App clickstream    │──────►   │              │──────►│ aggregation)        │
  │ Mobile App events      │──────►   │  Event Hub   │       ├─────────────────────┤
  │ Application logs       │──────►   │  (Partitions │──────►│ Azure Functions     │
  │ Payment events         │──────►   │   1,2,3...)  │       │ (process each event)│
  └────────────────────────┘          │              │──────►├─────────────────────┤
                                      │  Retention:  │       │ Azure Data Lake     │
                                      │  1-90 days   │       │ (cold storage)      │
                                      └──────────────┘       └─────────────────────┘

Key Concepts

Partitions

An Event Hub is divided into partitions — parallel lanes for events. Each partition is an ordered, immutable sequence of events. Partitions allow multiple consumers to read from the Event Hub simultaneously — each consumer handles one or more partitions. More partitions mean higher parallelism and throughput.

Events with the same partition key always go to the same partition, preserving order for related events (e.g., all events from the same device go to the same partition in order).

Consumer Groups

A consumer group is an independent view of the entire Event Hub. Multiple consumer groups can read the same events independently — each group maintains its own offset (position in the stream). One consumer group might feed a real-time dashboard, another feeds a data warehouse, and a third drives alerts — all reading the same events without interfering with each other.

Offset and Checkpointing

Each event in a partition has an offset — a sequence number identifying its position. Consumers track their offset (called checkpointing) so they know which events have been processed. If a consumer crashes and restarts, it resumes from its last checkpoint without missing or re-processing events.

Event Retention

Unlike Service Bus (where messages are deleted after consumption), Event Hub retains events for a configurable period (1 to 90 days). This enables:

  • Replaying events to reprocess with updated business logic
  • Adding new downstream consumers that catch up on historical data
  • Debugging by re-examining what events occurred

Event Hub Tiers

TierThroughput Units / Processing UnitsMax RetentionKey Features
BasicUp to 20 TUs (1 TU = 1 MB/s ingress, 2 MB/s egress)1 day1 consumer group
StandardUp to 20 TUs (auto-inflate available)7 days20 consumer groups, Event Capture
PremiumProcessing Units (PUs), fully isolated90 daysSchema Registry, dynamic partitions, VNet
DedicatedCapacity Units (CUs), dedicated hardware90 daysHighest throughput, private cluster

Event Capture

Event Capture automatically saves streaming events to Azure Blob Storage or Azure Data Lake Storage in Apache Avro format at regular intervals. This bridges the gap between real-time streaming and batch analytics — the raw event data is preserved for long-term storage and historical analysis even while it is being processed in real time.

Event Capture Flow

  IoT Devices ──► Event Hub ──► Real-time Processing (Stream Analytics)
                      │
                      └──► Event Capture ──► Azure Data Lake Storage
                           (Automatic, every few minutes)
                           (Avro format, organized by date/time)

Schema Registry

The Schema Registry in Event Hub Premium stores and manages message schemas (structures). When producers and consumers use the registry, they ensure that event data conforms to a defined format. This prevents schema mismatches that could cause consumers to crash when receiving unexpectedly structured events.

Common Event Hub Integration Patterns

  • IoT Telemetry Pipeline: Millions of IoT sensors → Event Hub → Azure Stream Analytics (real-time aggregation) → Power BI dashboard + Azure Data Lake (storage).
  • Clickstream Analytics: Website user clicks → Event Hub → Azure Functions (process each event) → Cosmos DB (user behavior store) + Azure Synapse (analytics).
  • Log Aggregation: Application logs from 100 microservices → Event Hub → Azure Monitor Log Analytics workspace.
  • Financial Fraud Detection: Transaction events → Event Hub → Azure Stream Analytics (detect patterns) → Alert if fraud pattern detected.

Key Takeaways

  • Azure Event Hub is designed for massive-scale event ingestion — millions of events per second from any source.
  • Partitions enable parallel processing; more partitions mean higher throughput and more concurrent consumers.
  • Consumer groups allow multiple independent applications to read the same events simultaneously.
  • Events are retained (not deleted after consumption) for 1 to 90 days, enabling replay and late-joining consumers.
  • Event Capture automatically saves streaming data to Blob Storage or Data Lake for long-term batch analytics.

Leave a Comment