Event Grid Dead Letter, Retry Policy and Delivery Guarantees

Azure Event Grid guarantees at-least-once delivery. This means every event gets at least one delivery attempt. When delivery fails, Event Grid retries using a built-in retry policy. When all retries are exhausted, events can be saved to a dead-letter location instead of being lost. Understanding these mechanisms is critical for building resilient, production-ready event-driven systems.

Delivery Guarantees

At-Least-Once Delivery

Event Grid guarantees that every event is delivered to matching subscriptions at least once. In the vast majority of cases, each event is delivered exactly once. In rare scenarios involving network timeouts or partial failures, Event Grid may deliver the same event more than once.

Handlers must be idempotent. An idempotent handler produces the same result whether it processes an event once or multiple times. For example, a handler that creates a record in a database must check whether the record already exists before inserting.

Example of Idempotent Handler Logic:

Event arrives with orderId = "ORD-9821"

Handler checks: Does a record with orderId = "ORD-9821" exist in the database?
  If NO  --> Insert new record
  If YES --> Skip insertion (record already created on first delivery)

Result: Even if Event Grid delivers the event twice, only one record is created.

Event Ordering

Event Grid does not guarantee event ordering. Events from the same source may arrive at a handler out of sequence. When ordering matters, include a sequence number or timestamp in the event data and handle reordering in the application layer.

Retry Policy

When event delivery fails — because the handler returns an error response or does not respond — Event Grid automatically retries delivery. The retry policy defines how many retries occur and for how long.

Default Retry Behavior

PropertyDefault ValueMaximum Value
Maximum delivery attempts3030
Event time-to-live (TTL)1440 minutes (24 hours)1440 minutes

Event Grid retries until it reaches either the maximum number of attempts or the event TTL — whichever comes first.

Exponential Backoff with Jitter

Event Grid uses exponential backoff with jitter between retry attempts. Each retry waits longer than the previous one. Jitter adds a small random offset to prevent all retries from hammering the handler at exactly the same moment.

Retry Schedule (approximate):

Attempt 1: Immediate delivery fails (handler returns 503)
Wait:  10 seconds + jitter
Attempt 2: Delivery fails
Wait:  30 seconds + jitter
Attempt 3: Delivery fails
Wait:  1 minute + jitter
Attempt 4: Delivery fails
Wait:  5 minutes + jitter
Attempt 5: Delivery fails
Wait:  10 minutes + jitter
...continues up to 30 attempts over 24 hours...
Attempt 30: Delivery fails
  --> Event goes to dead-letter location (if configured)
  --> OR event is dropped (if no dead-letter configured)

Configuring Retry Policy for Custom Topics

Retry policy is configured at the event subscription level. Two properties can be customized:

PropertyDescriptionRange
maxDeliveryAttemptsMaximum number of delivery attempts before giving up1 to 30
eventTimeToLiveInMinutesMaximum time to retry delivery before discarding1 to 1440 minutes
Setting Retry Policy via Azure CLI
az eventgrid event-subscription create \
  --name my-subscription \
  --source-resource-id /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.EventGrid/topics/mytopic \
  --endpoint https://my-handler.azurewebsites.net/api/events \
  --max-delivery-attempts 10 \
  --event-ttl 60

This configures a maximum of 10 delivery attempts within 60 minutes. If the handler does not respond within 60 minutes across 10 attempts, the event is considered undeliverable.

When Event Grid Stops Retrying

Event Grid immediately stops retrying and drops the event (or sends to dead letter) in three specific scenarios:

  • The handler returns HTTP 400 Bad Request — Event Grid treats this as a permanent error; retrying will not help
  • The handler returns HTTP 413 Request Entity Too Large — the event is too large for the handler to accept
  • The handler returns HTTP 403 Forbidden — authentication failed; retrying will not fix this

For 5xx errors (server errors), Event Grid retries because the failure is likely temporary. For 400-level client errors listed above, retrying would produce the same failure.

Dead Lettering

Dead lettering is the process of saving undeliverable events to a storage location instead of discarding them. When an event cannot be delivered after all retry attempts are exhausted, Event Grid writes the event to an Azure Blob Storage container designated as the dead-letter location.

Dead Letter Flow Diagram

Event Published to Topic
       |
       v
Event Grid delivers to Handler
       |
       v
Handler fails (returns 5xx or times out)
       |
       v
Event Grid retries (exponential backoff, up to 30 attempts / 24 hours)
       |
       v
All retries exhausted
       |
       +────────── Dead Letter configured? ──────────+
       |                                             |
      YES                                           NO
       |                                             |
       v                                             v
Event written to                               Event dropped
Azure Blob Storage                             (lost forever)
(dead-letter container)
       |
       v
Developer investigates and reprocesses manually

Configuring Dead Lettering

Dead lettering requires an Azure Blob Storage account and container. The Event Grid subscription writes dead-lettered events as JSON blobs.

Setting Up Dead Letter via Azure Portal
  1. Create an Azure Storage Account and a container (e.g., "deadletterevents")
  2. Navigate to the Event Grid topic and open the event subscription
  3. Scroll to Additional Features
  4. Under Dead-letter destination, select Storage account
  5. Choose the storage account and container
  6. Save the subscription
Setting Up Dead Letter via Azure CLI
az eventgrid event-subscription create \
  --name my-subscription \
  --source-resource-id /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.EventGrid/topics/mytopic \
  --endpoint https://my-handler.azurewebsites.net/api/events \
  --deadletter-endpoint /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/mystorage/blobServices/default/containers/deadletterevents

Dead Letter Event Format

When an event is dead-lettered, Event Grid adds extra metadata to explain why delivery failed.

Dead Letter Blob Content:
{
  "deadLetterReason": "MaxDeliveryAttemptsExceeded",
  "deliveryAttempts": 30,
  "lastDeliveryOutcome": "DeliveryFailed",
  "publishTime": "2024-06-15T09:30:00.000Z",
  "lastDeliveryAttemptTime": "2024-06-16T09:29:55.000Z",
  "subscription": "my-subscription",
  "topic": "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.EventGrid/topics/mytopic",
  "event": {
    "id": "evt-001",
    "eventType": "App.OrderPlaced",
    "subject": "/orders/ORD-9821",
    "data": { "orderId": "ORD-9821", "amount": 499.00 },
    "eventTime": "2024-06-15T09:30:00Z"
  }
}

Dead Letter Reasons

ReasonDescription
MaxDeliveryAttemptsExceededAll retry attempts were used without successful delivery
TimeToLiveExpiredEvent TTL expired before successful delivery
DeliveryTimeoutHandler did not respond within the timeout period
UnsupportedDeliveryEndpoint configuration error or endpoint type mismatch
NoMatchingSubscriptionsNo subscriptions matched the event (this reason is for publisher-side logging)

Monitoring Delivery Health

Azure Monitor provides metrics for Event Grid topic and subscription health. Key metrics to watch:

MetricDescription
Published EventsTotal events received by the topic
Matched EventsEvents that matched at least one subscription filter
Delivered EventsEvents successfully delivered to handlers
Delivery Failed EventsEvents that failed delivery (across all retries)
Dead Lettered EventsEvents moved to the dead-letter location
Dropped EventsEvents discarded because no dead-letter location is configured

Set Azure Monitor alerts on Dead Lettered Events and Delivery Failed Events to detect delivery problems early.

Best Practices for Reliable Event Delivery

PracticeWhy It Matters
Always configure a dead-letter locationPrevents silent event loss when handlers fail
Make handlers idempotentHandles duplicate deliveries safely
Return HTTP 200 quicklyProcess events asynchronously; confirm receipt immediately to avoid retry storms
Avoid returning HTTP 400 for temporary issuesHTTP 400 stops retries permanently; use 5xx for temporary failures
Monitor dead-letter container regularlyInvestigate and reprocess dead-lettered events promptly
Set alerts on delivery failure metricsDetect and respond to handler failures before events accumulate

Summary

Azure Event Grid provides at-least-once delivery with an exponential backoff retry policy. Configuring dead-letter storage ensures that no event is silently lost when all retries fail. Idempotent handlers, proper HTTP response codes, and Azure Monitor alerts together form a complete, resilient delivery strategy.

Leave a Comment