Service Bus Geo Disaster Recovery

Azure Service Bus Geo Disaster Recovery (Geo-DR) protects messaging namespaces against regional outages. It pairs a primary namespace in one Azure region with a secondary namespace in a different region. The entity metadata — queues, topics, subscriptions, and rules — replicates from primary to secondary continuously. If the primary region fails, a single failover command makes the secondary namespace the new primary, and applications resume messaging using the same alias endpoint without any code changes.

Important: What Geo-DR Replicates and What It Does Not

ItemReplicated?Notes
Queue definitionsYesName, properties, settings all replicated
Topic and subscription definitionsYesIncluding filters and actions
Shared Access PoliciesYesAuthorization rules replicated
Messages in queuesNoIn-flight messages are NOT replicated
Messages in subscriptionsNoUnprocessed messages stay in primary only

Geo-DR is a metadata replication feature, not a data replication feature. Applications must handle message loss during failover through their own retry and idempotency mechanisms.

Geo-DR Architecture

+---------------------------+       Alias Endpoint (always active)
| Primary: myshopns-primary |  <--- sb://myshopns.servicebus.windows.net
| Region : East US          |
|  [Queue: orders]          |       Metadata replication
|  [Topic: order-events]    | -----+-----------------------------+
+---------------------------+      |                             |
        (Active for sends/receives)|                             v
                                   |    +---------------------------+
                                   |    | Secondary: myshopns-sec   |
                                   |    | Region : West US          |
                                   |    |  [Queue: orders]          |
                                   |    |  [Topic: order-events]    |
                                   |    +---------------------------+
                                   |           (passive — receives
                                   |            no messages until
                                   |            failover is triggered)

Setting Up Geo-DR — Step by Step

Step 1: Create Primary and Secondary Namespaces

# Primary namespace (East US)
az servicebus namespace create   --resource-group rg-messaging-prod   --name myshopns-primary   --location eastus   --sku Premium

# Secondary namespace (West US) — same tier
az servicebus namespace create   --resource-group rg-messaging-prod   --name myshopns-secondary   --location westus   --sku Premium

Step 2: Create the Pairing (Alias)

# Get primary namespace resource ID
PRIMARY_ID=$(az servicebus namespace show   --resource-group rg-messaging-prod   --name myshopns-primary   --query id --output tsv)

# Create the alias pairing
az servicebus georecovery-alias create   --resource-group rg-messaging-prod   --namespace-name myshopns-primary   --alias myshopns   --partner-namespace $PRIMARY_ID

Step 3: Check Pairing Status

az servicebus georecovery-alias show   --resource-group rg-messaging-prod   --namespace-name myshopns-primary   --alias myshopns

# Output includes:
#   "provisioningState": "Succeeded"
#   "role": "Primary"
#   "pendingReplicationOperationsCount": 0  (0 = fully synced)
Replication States:
  Pending  --> Pairing request submitted, replication starting
  Syncing  --> Entity metadata copying from primary to secondary
  Succeeded --> Fully paired and in sync — ready for failover

Alias Endpoint — How Applications Connect

Applications connect using the alias endpoint, not the primary or secondary namespace endpoint directly. The alias always points to the currently active namespace.

Alias endpoint:
  sb://myshopns.servicebus.windows.net

Before failover --> resolves to --> myshopns-primary.servicebus.windows.net
After failover  --> resolves to --> myshopns-secondary.servicebus.windows.net

Application connection string uses ALIAS — never changes:
  Endpoint=sb://myshopns.servicebus.windows.net/;
  SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=<key>
// .NET — connect using alias, not primary namespace name
string aliasConnectionString =
    "Endpoint=sb://myshopns.servicebus.windows.net/;...";

await using var client = new ServiceBusClient(aliasConnectionString);
// Works before AND after failover — no code change needed

Initiating a Failover

When to Fail Over

  • Primary region is experiencing a confirmed outage (Azure Service Health alert)
  • Primary namespace is unreachable for an extended period
  • Planned region evacuation or maintenance window

Failover Command

# Trigger failover — secondary becomes primary
az servicebus georecovery-alias fail-over   --resource-group rg-messaging-prod   --namespace-name myshopns-secondary   --alias myshopns

What Happens During Failover

Before failover:
  Alias: myshopns --> Primary (East US) [Active]
                  --> Secondary (West US) [Passive]

Failover triggered:
  Step 1: Pairing is broken
  Step 2: Secondary namespace promoted to Primary role
  Step 3: Alias DNS updated: myshopns --> Secondary (West US)

After failover:
  Alias: myshopns --> Secondary (now acting as Primary) [Active]
  Old primary (East US) is now unlinked and standalone

Recovery:
  After East US recovers:
  Step 1: Create a new secondary namespace (e.g., in North US)
  Step 2: Create a new pairing using the current primary (West US)

Geo-DR with Availability Zones vs Geo-DR

FeatureAvailability ZonesGeo-DR
Protection scopeZone failure within one regionFull regional failure
FailoverAutomatic — no action neededManual — trigger failover command
Message replicationYes — within regionNo — metadata only
Recovery timeSeconds (automatic)Minutes (manual trigger)
Tier requiredPremiumPremium

Message Loss During Failover — Mitigation Strategies

Since messages are not replicated, some messages in the primary queue may be lost during failover. Mitigate this with the following approaches.

Strategy 1 — Active-Active Dual Publish

Sender publishes to BOTH primary and secondary namespaces simultaneously.
Consumers read from the active alias endpoint.
After failover, no messages are lost because secondary already has them.

[Sender] --> [Primary namespace: orders queue]
         --> [Secondary namespace: orders queue]

Consumer reads from alias (active namespace only).

Downside: Duplicate detection must be enabled to prevent double processing.

Strategy 2 — Application-Level Retry with Idempotency

Sender catches ServiceBusException.
Waits for failover to complete (poll alias endpoint health).
Re-sends failed messages.

Consumer implements idempotency:
  if (OrderAlreadyProcessed(orderId)) { complete; return; }
  ProcessOrder(orderId);
  MarkAsProcessed(orderId);

Testing Failover in a Safe Environment

# 1. Set up pairing in staging environment
# 2. Send 100 test messages to primary queue
# 3. Process 50 messages — leave 50 unprocessed
# 4. Trigger failover manually
# 5. Observe: alias now points to secondary
# 6. Verify: 50 unprocessed messages are gone (expected — not replicated)
# 7. Verify: new sends go to secondary and consumers receive them
# 8. Document the RTO (recovery time objective) measured

Geo-DR Best Practices

  • Always connect via the alias endpoint — never hardcode primary or secondary namespace names in application code
  • Enable Duplicate Detection on all queues and topics when using active-active dual publish
  • Design consumers to be idempotent — message re-delivery after failover must not cause double processing
  • Set up Azure Monitor alerts on the primary namespace to detect outages early
  • Test failover in staging at least quarterly — untested failover procedures fail under pressure
  • After failover, establish a new secondary pairing immediately — do not leave the recovered namespace unpaired
  • Document the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) for the application

Summary

Azure Service Bus Geo Disaster Recovery pairs a primary namespace with a secondary namespace in a different region and replicates entity metadata between them. Applications use a stable alias endpoint that automatically resolves to whichever namespace is currently active. When the primary region fails, a single failover command promotes the secondary to active within minutes. Because messages themselves are not replicated, applications must implement idempotent consumers and retry logic to handle potential message loss during failover. Geo-DR is available only in the Premium tier and works best when combined with availability zones for zone-level protection within a region.

Leave a Comment