Service Bus Best Practices and Patterns

Building reliable, high-performance messaging systems with Azure Service Bus requires more than knowing the API. This topic consolidates the most important architectural patterns, design principles, operational practices, and performance techniques for building Service Bus solutions that work correctly at scale, under failure conditions, and over the long term.

1. Core Reliability Principles

Always Use Peek-Lock Mode

Receive-and-Delete mode:
  Message deleted immediately on receive.
  If app crashes before processing finishes --> message LOST forever.

Peek-Lock mode (always use this):
  Message locked for processing duration.
  If app crashes --> lock expires --> message returns to queue.
  No data loss.

Never Auto-Complete in Code — Complete Manually

// Wrong: AutoCompleteMessages = true
//   Message deleted even if business logic fails after receipt.

// Correct: AutoCompleteMessages = false
processor.ProcessMessageAsync += async args =>
{
    try
    {
        await ProcessBusinessLogic(args.Message);
        await args.CompleteMessageAsync(args.Message);   // explicit success
    }
    catch
    {
        await args.AbandonMessageAsync(args.Message);    // explicit failure
    }
};

Set Appropriate MaxDeliveryCount

Scenario	Recommended MaxDeliveryCount
Fast processing, low complexity	3 to 5
Medium complexity, external API calls	5 to 10
Long-running, network-dependent	10 to 15
Critical financial transactions	3 (fail fast and dead-letter for investigation)

2. Idempotent Consumer Pattern

Service Bus guarantees at-least-once delivery — not exactly-once. A message may be delivered more than once due to lock expiry, consumer crash before completing, or failover. Consumers must handle duplicate delivery without side effects.

// Wrong — processes every delivery without idempotency check
public async Task ProcessOrder(string orderId)
{
    await ChargeCustomer(orderId);  // charged twice if message delivered twice!
    await ShipOrder(orderId);
}

// Correct — idempotent consumer
public async Task ProcessOrder(string orderId)
{
    if (await _db.OrderAlreadyProcessed(orderId))
    {
        _logger.LogInformation($"Duplicate delivery for {orderId} — skipping.");
        return;  // safe to skip
    }

    await ChargeCustomer(orderId);
    await ShipOrder(orderId);
    await _db.MarkOrderProcessed(orderId);
}

3. Outbox Pattern — Database + Service Bus Atomicity

A common problem: update a database AND send a Service Bus message atomically. Service Bus transactions work within Service Bus only. Database transactions work within the database only. The Outbox Pattern bridges both without a distributed transaction coordinator.

Step 1: Application writes business data AND an outbox record IN ONE DB TRANSACTION.
        BEGIN DB TRANSACTION
          INSERT INTO Orders (orderId, status) VALUES (101, 'Pending')
          INSERT INTO Outbox (eventType, payload, status) VALUES ('OrderCreated', '{...}', 'Pending')
        COMMIT

Step 2: A background worker reads the Outbox table.
        SELECT * FROM Outbox WHERE status = 'Pending'

Step 3: Worker sends each outbox event to Service Bus.
        await sender.SendMessageAsync(message);

Step 4: Worker marks outbox record as 'Sent'.
        UPDATE Outbox SET status = 'Sent' WHERE id = @id

Result:
  Business data and event emission are eventually consistent.
  No distributed transaction needed.
  If Service Bus is temporarily down, Outbox records accumulate and are sent when it recovers.

Outbox Flow Diagram

Application
  |
  | Single DB transaction
  v
+----------------------------+
| Database                   |
|  Orders table: row inserted|
|  Outbox table: row inserted|
+----------------------------+
         |
         | Background worker (every 5 seconds)
         v
+----------------------------+
| Read Pending Outbox rows   |
| Send each to Service Bus   |
| Mark row as Sent           |
+----------------------------+
         |
         v
[Azure Service Bus Queue/Topic]

4. Competing Consumers Pattern

Scale message processing by running multiple consumer instances against the same queue. Service Bus distributes messages across all active consumers. Each message goes to exactly one consumer.

[Queue: orders]
  |
  |-- Consumer Instance 1 (processes 33% of messages)
  |-- Consumer Instance 2 (processes 33% of messages)
  |-- Consumer Instance 3 (processes 33% of messages)

Scale based on queue depth:
  Low queue depth  --> 1 instance (save cost)
  High queue depth --> 10 instances (throughput)

5. Load Leveling Pattern

Service Bus acts as a buffer between high-burst senders and steady-rate consumers. Spikes in sender traffic are absorbed by the queue. Consumers process at a consistent rate without being overwhelmed.

WITHOUT Load Leveling:
  Traffic spike: 10,000 req/sec --> API crashes (over capacity)

WITH Service Bus Load Leveling:
  Traffic spike: 10,000 req/sec --> 10,000 messages/sec --> Queue (buffer)
                                                           Consumer: 1,000 msg/sec (steady)

Queue absorbs the spike.
Consumer drains the queue at its own pace.
No crash. No data loss.

6. Message Priority Pattern

Service Bus queues are FIFO — no native priority. Implement priority by using multiple queues with different consumer concurrency settings.

[Queue: orders-high-priority]  --> 10 consumer threads
[Queue: orders-normal-priority]--> 3 consumer threads
[Queue: orders-low-priority]   --> 1 consumer thread

Producer routing logic:
  if (order.amount > 10000) send to orders-high-priority
  else if (order.type == "Express") send to orders-normal-priority
  else send to orders-low-priority

7. Retry with Exponential Backoff

When a consumer fails to process a message due to a transient error (external API down, database timeout), it should back off before retrying rather than hammering the failing dependency.

Retry Strategy using MaxDeliveryCount + message properties:

First delivery  (DeliveryCount = 1): process fails --> Abandon
  Wait: Service Bus redelivers after LockDuration expires (60 sec)

Second delivery (DeliveryCount = 2): process fails --> Defer + reschedule
  Wait: schedule redelivery at UtcNow + 2 minutes

Third delivery  (DeliveryCount = 3): process fails --> Defer + reschedule
  Wait: schedule redelivery at UtcNow + 4 minutes

Fourth delivery (DeliveryCount = 4): process fails --> Dead-Letter
  Reason: "MaxRetriesExceeded"
  Action: Alert ops team to investigate

8. Claim Check Pattern (Large Messages)

Instead of putting large payloads (files, images, reports) in messages, store them in Azure Blob Storage and include only a reference (URL or blob ID) in the message. This keeps messages small and avoids message size limits.

WITHOUT Claim Check:
  Message body = entire 50 MB PDF report --> EXCEEDS 256 KB limit (Standard tier)

WITH Claim Check:
  Step 1: Upload PDF to Azure Blob Storage
          URL: https://mystorage.blob.core.windows.net/reports/report-q4.pdf

  Step 2: Send Service Bus message with reference only
          {
            "reportId": "q4-2024",
            "blobUrl": "https://mystorage.blob.core.windows.net/reports/report-q4.pdf",
            "contentType": "application/pdf"
          }

  Step 3: Consumer reads message, downloads PDF from Blob URL, processes it.

9. Saga Pattern — Distributed Transactions

When a business process spans multiple services (Order → Inventory → Payment → Shipping), each step publishes an event and the next step listens. If any step fails, compensating transactions roll back previous steps.

Order Saga:

Step 1: OrderService   publishes "OrderCreated"
Step 2: InventoryService reads "OrderCreated" --> reserves stock --> publishes "StockReserved"
Step 3: PaymentService reads "StockReserved"  --> charges card  --> publishes "PaymentDone"
Step 4: ShippingService reads "PaymentDone"   --> ships order   --> publishes "OrderShipped"

If Step 3 (Payment) fails:
  PaymentService publishes "PaymentFailed"
  InventoryService reads "PaymentFailed" --> releases reserved stock (compensating action)
  OrderService reads "PaymentFailed"     --> cancels order

Each step is a separate Service Bus message.
Each service is independent and fault-tolerant.

10. Namespace and Entity Naming Conventions

Resource	Naming Convention	Example
Namespace	company-domain-env	myshop-orders-prod
Queue	domain-action	orders-processing, payments-capture
Topic	domain-events	order-events, inventory-events
Subscription	service-sub	inventory-sub, email-sub, billing-sub
SAS Policy	role-policy	sender-policy, receiver-policy

11. Security Best Practices

Use Managed Identity + RBAC for all Azure-hosted services — eliminate connection strings entirely
Never use RootManageSharedAccessKey in application code — it has full namespace access
Create entity-level SAS policies per role — separate Send, Listen, Manage policies
Store connection strings in Azure Key Vault and reference them via Key Vault references in App Settings
Enable Disable Local Auth on production namespaces to block SAS authentication
Use Private Endpoints on Premium namespaces for banking, healthcare, and regulated workloads

12. Performance Best Practices

Technique	Impact
Batch sending (CreateMessageBatchAsync)	Reduces round trips — 10x fewer network calls for 10x message throughput
Prefetch (PrefetchCount)	Pre-fetches messages into local buffer — reduces latency for next message
AmqpWebSockets transport	Use port 443 — avoids firewall issues on AMQP port 5671
One ServiceBusClient per app	Shared connection pool — avoids socket exhaustion
Use partitioned queues for >40K msg/sec	Distributes load across multiple brokers internally
Tune MaxConcurrentCalls	Match to CPU core count for CPU-bound processing

13. Operational Checklist

Check	Action
DLQ growing	Alert when count > 0 for critical queues
Active message backlog	Alert when growing for > 10 min — scale consumers
Throttled requests	Alert immediately — upgrade MUs or reduce throughput
Namespace size	Alert at 70% — increase max size or drain faster
Connection count drop	Alert — indicates consumer app crash or connectivity issue
TTL expiry rate	Monitor — messages expiring = consumers too slow or TTL set too short

14. Common Mistakes to Avoid

Mistake	Consequence	Fix
Using Receive-and-Delete mode	Message lost if processing crashes	Always use Peek-Lock
New GUID as MessageId each retry	Duplicate detection never triggers	Use stable business key as MessageId
Ignoring the DLQ	Silent business event loss	Monitor and alert on DLQ growth
Creating new ServiceBusClient per request	Socket exhaustion, performance degradation	One client per app, registered as singleton
Not setting TTL on messages	Queue fills with stale, unprocessable messages	Set sensible TTL matching business SLA
Hardcoding connection strings in code	Secret exposure in version control	Use Azure Key Vault or Managed Identity
Not testing failover in staging	Failover fails under real disaster pressure	Run failover drill quarterly

Architectural Pattern Summary

+---------------------------------------------------+
|  Pattern              | When to Use               |
|---------------------------------------------------+
|  Competing Consumers  | Scale consumer throughput  |
|  Load Leveling        | Handle traffic spikes      |
|  Idempotent Consumer  | At-least-once delivery     |
|  Outbox Pattern       | DB + messaging atomicity   |
|  Claim Check          | Large payloads             |
|  Priority Queues      | Tiered processing speed    |
|  Saga                 | Distributed transactions   |
|  Message Sessions     | Ordered group processing   |
|  Dead Letter Monitor  | Failure detection          |
|  Geo-DR with Alias    | Regional disaster recovery |
+---------------------------------------------------+

Summary

Building production-grade Azure Service Bus systems requires applying the right patterns to the right problems. Always use Peek-Lock with manual completion for reliable message processing. Design consumers to be idempotent since at-least-once delivery is guaranteed, not exactly-once. Use the Outbox Pattern to achieve atomicity between database writes and message publishing. Apply Load Leveling to absorb traffic spikes without crashing downstream services. Implement the Competing Consumers pattern to scale processing with more instances. Use Managed Identity and RBAC to eliminate secrets from application code. Monitor DLQ depth, active message count, and throttled requests as the three most important health indicators. These patterns and practices, applied consistently, result in messaging systems that are resilient, scalable, secure, and easy to operate in production.

Previous lessons

Back to courses