Service Bus Geo Disaster Recovery
Azure Service Bus Geo Disaster Recovery (Geo-DR) protects messaging namespaces against regional outages. It pairs a primary namespace in one Azure region with a secondary namespace in a different region. The entity metadata — queues, topics, subscriptions, and rules — replicates from primary to secondary continuously. If the primary region fails, a single failover command makes the secondary namespace the new primary, and applications resume messaging using the same alias endpoint without any code changes.
Important: What Geo-DR Replicates and What It Does Not
| Item | Replicated? | Notes |
|---|---|---|
| Queue definitions | Yes | Name, properties, settings all replicated |
| Topic and subscription definitions | Yes | Including filters and actions |
| Shared Access Policies | Yes | Authorization rules replicated |
| Messages in queues | No | In-flight messages are NOT replicated |
| Messages in subscriptions | No | Unprocessed messages stay in primary only |
Geo-DR is a metadata replication feature, not a data replication feature. Applications must handle message loss during failover through their own retry and idempotency mechanisms.
Geo-DR Architecture
+---------------------------+ Alias Endpoint (always active)
| Primary: myshopns-primary | <--- sb://myshopns.servicebus.windows.net
| Region : East US |
| [Queue: orders] | Metadata replication
| [Topic: order-events] | -----+-----------------------------+
+---------------------------+ | |
(Active for sends/receives)| v
| +---------------------------+
| | Secondary: myshopns-sec |
| | Region : West US |
| | [Queue: orders] |
| | [Topic: order-events] |
| +---------------------------+
| (passive — receives
| no messages until
| failover is triggered)
Setting Up Geo-DR — Step by Step
Step 1: Create Primary and Secondary Namespaces
# Primary namespace (East US) az servicebus namespace create --resource-group rg-messaging-prod --name myshopns-primary --location eastus --sku Premium # Secondary namespace (West US) — same tier az servicebus namespace create --resource-group rg-messaging-prod --name myshopns-secondary --location westus --sku Premium
Step 2: Create the Pairing (Alias)
# Get primary namespace resource ID PRIMARY_ID=$(az servicebus namespace show --resource-group rg-messaging-prod --name myshopns-primary --query id --output tsv) # Create the alias pairing az servicebus georecovery-alias create --resource-group rg-messaging-prod --namespace-name myshopns-primary --alias myshopns --partner-namespace $PRIMARY_ID
Step 3: Check Pairing Status
az servicebus georecovery-alias show --resource-group rg-messaging-prod --namespace-name myshopns-primary --alias myshopns # Output includes: # "provisioningState": "Succeeded" # "role": "Primary" # "pendingReplicationOperationsCount": 0 (0 = fully synced)
Replication States: Pending --> Pairing request submitted, replication starting Syncing --> Entity metadata copying from primary to secondary Succeeded --> Fully paired and in sync — ready for failover
Alias Endpoint — How Applications Connect
Applications connect using the alias endpoint, not the primary or secondary namespace endpoint directly. The alias always points to the currently active namespace.
Alias endpoint: sb://myshopns.servicebus.windows.net Before failover --> resolves to --> myshopns-primary.servicebus.windows.net After failover --> resolves to --> myshopns-secondary.servicebus.windows.net Application connection string uses ALIAS — never changes: Endpoint=sb://myshopns.servicebus.windows.net/; SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=<key>
// .NET — connect using alias, not primary namespace name
string aliasConnectionString =
"Endpoint=sb://myshopns.servicebus.windows.net/;...";
await using var client = new ServiceBusClient(aliasConnectionString);
// Works before AND after failover — no code change needed
Initiating a Failover
When to Fail Over
- Primary region is experiencing a confirmed outage (Azure Service Health alert)
- Primary namespace is unreachable for an extended period
- Planned region evacuation or maintenance window
Failover Command
# Trigger failover — secondary becomes primary az servicebus georecovery-alias fail-over --resource-group rg-messaging-prod --namespace-name myshopns-secondary --alias myshopns
What Happens During Failover
Before failover:
Alias: myshopns --> Primary (East US) [Active]
--> Secondary (West US) [Passive]
Failover triggered:
Step 1: Pairing is broken
Step 2: Secondary namespace promoted to Primary role
Step 3: Alias DNS updated: myshopns --> Secondary (West US)
After failover:
Alias: myshopns --> Secondary (now acting as Primary) [Active]
Old primary (East US) is now unlinked and standalone
Recovery:
After East US recovers:
Step 1: Create a new secondary namespace (e.g., in North US)
Step 2: Create a new pairing using the current primary (West US)
Geo-DR with Availability Zones vs Geo-DR
| Feature | Availability Zones | Geo-DR |
|---|---|---|
| Protection scope | Zone failure within one region | Full regional failure |
| Failover | Automatic — no action needed | Manual — trigger failover command |
| Message replication | Yes — within region | No — metadata only |
| Recovery time | Seconds (automatic) | Minutes (manual trigger) |
| Tier required | Premium | Premium |
Message Loss During Failover — Mitigation Strategies
Since messages are not replicated, some messages in the primary queue may be lost during failover. Mitigate this with the following approaches.
Strategy 1 — Active-Active Dual Publish
Sender publishes to BOTH primary and secondary namespaces simultaneously.
Consumers read from the active alias endpoint.
After failover, no messages are lost because secondary already has them.
[Sender] --> [Primary namespace: orders queue]
--> [Secondary namespace: orders queue]
Consumer reads from alias (active namespace only).
Downside: Duplicate detection must be enabled to prevent double processing.
Strategy 2 — Application-Level Retry with Idempotency
Sender catches ServiceBusException.
Waits for failover to complete (poll alias endpoint health).
Re-sends failed messages.
Consumer implements idempotency:
if (OrderAlreadyProcessed(orderId)) { complete; return; }
ProcessOrder(orderId);
MarkAsProcessed(orderId);
Testing Failover in a Safe Environment
# 1. Set up pairing in staging environment # 2. Send 100 test messages to primary queue # 3. Process 50 messages — leave 50 unprocessed # 4. Trigger failover manually # 5. Observe: alias now points to secondary # 6. Verify: 50 unprocessed messages are gone (expected — not replicated) # 7. Verify: new sends go to secondary and consumers receive them # 8. Document the RTO (recovery time objective) measured
Geo-DR Best Practices
- Always connect via the alias endpoint — never hardcode primary or secondary namespace names in application code
- Enable Duplicate Detection on all queues and topics when using active-active dual publish
- Design consumers to be idempotent — message re-delivery after failover must not cause double processing
- Set up Azure Monitor alerts on the primary namespace to detect outages early
- Test failover in staging at least quarterly — untested failover procedures fail under pressure
- After failover, establish a new secondary pairing immediately — do not leave the recovered namespace unpaired
- Document the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) for the application
Summary
Azure Service Bus Geo Disaster Recovery pairs a primary namespace with a secondary namespace in a different region and replicates entity metadata between them. Applications use a stable alias endpoint that automatically resolves to whichever namespace is currently active. When the primary region fails, a single failover command promotes the secondary to active within minutes. Because messages themselves are not replicated, applications must implement idempotent consumers and retry logic to handle potential message loss during failover. Geo-DR is available only in the Premium tier and works best when combined with availability zones for zone-level protection within a region.
