Service Bus Best Practices and Patterns
Building reliable, high-performance messaging systems with Azure Service Bus requires more than knowing the API. This topic consolidates the most important architectural patterns, design principles, operational practices, and performance techniques for building Service Bus solutions that work correctly at scale, under failure conditions, and over the long term.
1. Core Reliability Principles
Always Use Peek-Lock Mode
Receive-and-Delete mode: Message deleted immediately on receive. If app crashes before processing finishes --> message LOST forever. Peek-Lock mode (always use this): Message locked for processing duration. If app crashes --> lock expires --> message returns to queue. No data loss.
Never Auto-Complete in Code — Complete Manually
// Wrong: AutoCompleteMessages = true
// Message deleted even if business logic fails after receipt.
// Correct: AutoCompleteMessages = false
processor.ProcessMessageAsync += async args =>
{
try
{
await ProcessBusinessLogic(args.Message);
await args.CompleteMessageAsync(args.Message); // explicit success
}
catch
{
await args.AbandonMessageAsync(args.Message); // explicit failure
}
};
Set Appropriate MaxDeliveryCount
| Scenario | Recommended MaxDeliveryCount |
|---|---|
| Fast processing, low complexity | 3 to 5 |
| Medium complexity, external API calls | 5 to 10 |
| Long-running, network-dependent | 10 to 15 |
| Critical financial transactions | 3 (fail fast and dead-letter for investigation) |
2. Idempotent Consumer Pattern
Service Bus guarantees at-least-once delivery — not exactly-once. A message may be delivered more than once due to lock expiry, consumer crash before completing, or failover. Consumers must handle duplicate delivery without side effects.
// Wrong — processes every delivery without idempotency check
public async Task ProcessOrder(string orderId)
{
await ChargeCustomer(orderId); // charged twice if message delivered twice!
await ShipOrder(orderId);
}
// Correct — idempotent consumer
public async Task ProcessOrder(string orderId)
{
if (await _db.OrderAlreadyProcessed(orderId))
{
_logger.LogInformation($"Duplicate delivery for {orderId} — skipping.");
return; // safe to skip
}
await ChargeCustomer(orderId);
await ShipOrder(orderId);
await _db.MarkOrderProcessed(orderId);
}
3. Outbox Pattern — Database + Service Bus Atomicity
A common problem: update a database AND send a Service Bus message atomically. Service Bus transactions work within Service Bus only. Database transactions work within the database only. The Outbox Pattern bridges both without a distributed transaction coordinator.
Step 1: Application writes business data AND an outbox record IN ONE DB TRANSACTION.
BEGIN DB TRANSACTION
INSERT INTO Orders (orderId, status) VALUES (101, 'Pending')
INSERT INTO Outbox (eventType, payload, status) VALUES ('OrderCreated', '{...}', 'Pending')
COMMIT
Step 2: A background worker reads the Outbox table.
SELECT * FROM Outbox WHERE status = 'Pending'
Step 3: Worker sends each outbox event to Service Bus.
await sender.SendMessageAsync(message);
Step 4: Worker marks outbox record as 'Sent'.
UPDATE Outbox SET status = 'Sent' WHERE id = @id
Result:
Business data and event emission are eventually consistent.
No distributed transaction needed.
If Service Bus is temporarily down, Outbox records accumulate and are sent when it recovers.
Outbox Flow Diagram
Application
|
| Single DB transaction
v
+----------------------------+
| Database |
| Orders table: row inserted|
| Outbox table: row inserted|
+----------------------------+
|
| Background worker (every 5 seconds)
v
+----------------------------+
| Read Pending Outbox rows |
| Send each to Service Bus |
| Mark row as Sent |
+----------------------------+
|
v
[Azure Service Bus Queue/Topic]
4. Competing Consumers Pattern
Scale message processing by running multiple consumer instances against the same queue. Service Bus distributes messages across all active consumers. Each message goes to exactly one consumer.
[Queue: orders] | |-- Consumer Instance 1 (processes 33% of messages) |-- Consumer Instance 2 (processes 33% of messages) |-- Consumer Instance 3 (processes 33% of messages) Scale based on queue depth: Low queue depth --> 1 instance (save cost) High queue depth --> 10 instances (throughput)
5. Load Leveling Pattern
Service Bus acts as a buffer between high-burst senders and steady-rate consumers. Spikes in sender traffic are absorbed by the queue. Consumers process at a consistent rate without being overwhelmed.
WITHOUT Load Leveling:
Traffic spike: 10,000 req/sec --> API crashes (over capacity)
WITH Service Bus Load Leveling:
Traffic spike: 10,000 req/sec --> 10,000 messages/sec --> Queue (buffer)
Consumer: 1,000 msg/sec (steady)
Queue absorbs the spike.
Consumer drains the queue at its own pace.
No crash. No data loss.
6. Message Priority Pattern
Service Bus queues are FIFO — no native priority. Implement priority by using multiple queues with different consumer concurrency settings.
[Queue: orders-high-priority] --> 10 consumer threads [Queue: orders-normal-priority]--> 3 consumer threads [Queue: orders-low-priority] --> 1 consumer thread Producer routing logic: if (order.amount > 10000) send to orders-high-priority else if (order.type == "Express") send to orders-normal-priority else send to orders-low-priority
7. Retry with Exponential Backoff
When a consumer fails to process a message due to a transient error (external API down, database timeout), it should back off before retrying rather than hammering the failing dependency.
Retry Strategy using MaxDeliveryCount + message properties: First delivery (DeliveryCount = 1): process fails --> Abandon Wait: Service Bus redelivers after LockDuration expires (60 sec) Second delivery (DeliveryCount = 2): process fails --> Defer + reschedule Wait: schedule redelivery at UtcNow + 2 minutes Third delivery (DeliveryCount = 3): process fails --> Defer + reschedule Wait: schedule redelivery at UtcNow + 4 minutes Fourth delivery (DeliveryCount = 4): process fails --> Dead-Letter Reason: "MaxRetriesExceeded" Action: Alert ops team to investigate
8. Claim Check Pattern (Large Messages)
Instead of putting large payloads (files, images, reports) in messages, store them in Azure Blob Storage and include only a reference (URL or blob ID) in the message. This keeps messages small and avoids message size limits.
WITHOUT Claim Check:
Message body = entire 50 MB PDF report --> EXCEEDS 256 KB limit (Standard tier)
WITH Claim Check:
Step 1: Upload PDF to Azure Blob Storage
URL: https://mystorage.blob.core.windows.net/reports/report-q4.pdf
Step 2: Send Service Bus message with reference only
{
"reportId": "q4-2024",
"blobUrl": "https://mystorage.blob.core.windows.net/reports/report-q4.pdf",
"contentType": "application/pdf"
}
Step 3: Consumer reads message, downloads PDF from Blob URL, processes it.
9. Saga Pattern — Distributed Transactions
When a business process spans multiple services (Order → Inventory → Payment → Shipping), each step publishes an event and the next step listens. If any step fails, compensating transactions roll back previous steps.
Order Saga: Step 1: OrderService publishes "OrderCreated" Step 2: InventoryService reads "OrderCreated" --> reserves stock --> publishes "StockReserved" Step 3: PaymentService reads "StockReserved" --> charges card --> publishes "PaymentDone" Step 4: ShippingService reads "PaymentDone" --> ships order --> publishes "OrderShipped" If Step 3 (Payment) fails: PaymentService publishes "PaymentFailed" InventoryService reads "PaymentFailed" --> releases reserved stock (compensating action) OrderService reads "PaymentFailed" --> cancels order Each step is a separate Service Bus message. Each service is independent and fault-tolerant.
10. Namespace and Entity Naming Conventions
| Resource | Naming Convention | Example |
|---|---|---|
| Namespace | company-domain-env | myshop-orders-prod |
| Queue | domain-action | orders-processing, payments-capture |
| Topic | domain-events | order-events, inventory-events |
| Subscription | service-sub | inventory-sub, email-sub, billing-sub |
| SAS Policy | role-policy | sender-policy, receiver-policy |
11. Security Best Practices
- Use Managed Identity + RBAC for all Azure-hosted services — eliminate connection strings entirely
- Never use
RootManageSharedAccessKeyin application code — it has full namespace access - Create entity-level SAS policies per role — separate Send, Listen, Manage policies
- Store connection strings in Azure Key Vault and reference them via Key Vault references in App Settings
- Enable Disable Local Auth on production namespaces to block SAS authentication
- Use Private Endpoints on Premium namespaces for banking, healthcare, and regulated workloads
12. Performance Best Practices
| Technique | Impact |
|---|---|
| Batch sending (CreateMessageBatchAsync) | Reduces round trips — 10x fewer network calls for 10x message throughput |
| Prefetch (PrefetchCount) | Pre-fetches messages into local buffer — reduces latency for next message |
| AmqpWebSockets transport | Use port 443 — avoids firewall issues on AMQP port 5671 |
| One ServiceBusClient per app | Shared connection pool — avoids socket exhaustion |
| Use partitioned queues for >40K msg/sec | Distributes load across multiple brokers internally |
| Tune MaxConcurrentCalls | Match to CPU core count for CPU-bound processing |
13. Operational Checklist
| Check | Action |
|---|---|
| DLQ growing | Alert when count > 0 for critical queues |
| Active message backlog | Alert when growing for > 10 min — scale consumers |
| Throttled requests | Alert immediately — upgrade MUs or reduce throughput |
| Namespace size | Alert at 70% — increase max size or drain faster |
| Connection count drop | Alert — indicates consumer app crash or connectivity issue |
| TTL expiry rate | Monitor — messages expiring = consumers too slow or TTL set too short |
14. Common Mistakes to Avoid
| Mistake | Consequence | Fix |
|---|---|---|
| Using Receive-and-Delete mode | Message lost if processing crashes | Always use Peek-Lock |
| New GUID as MessageId each retry | Duplicate detection never triggers | Use stable business key as MessageId |
| Ignoring the DLQ | Silent business event loss | Monitor and alert on DLQ growth |
| Creating new ServiceBusClient per request | Socket exhaustion, performance degradation | One client per app, registered as singleton |
| Not setting TTL on messages | Queue fills with stale, unprocessable messages | Set sensible TTL matching business SLA |
| Hardcoding connection strings in code | Secret exposure in version control | Use Azure Key Vault or Managed Identity |
| Not testing failover in staging | Failover fails under real disaster pressure | Run failover drill quarterly |
Architectural Pattern Summary
+---------------------------------------------------+ | Pattern | When to Use | |---------------------------------------------------+ | Competing Consumers | Scale consumer throughput | | Load Leveling | Handle traffic spikes | | Idempotent Consumer | At-least-once delivery | | Outbox Pattern | DB + messaging atomicity | | Claim Check | Large payloads | | Priority Queues | Tiered processing speed | | Saga | Distributed transactions | | Message Sessions | Ordered group processing | | Dead Letter Monitor | Failure detection | | Geo-DR with Alias | Regional disaster recovery | +---------------------------------------------------+
Summary
Building production-grade Azure Service Bus systems requires applying the right patterns to the right problems. Always use Peek-Lock with manual completion for reliable message processing. Design consumers to be idempotent since at-least-once delivery is guaranteed, not exactly-once. Use the Outbox Pattern to achieve atomicity between database writes and message publishing. Apply Load Leveling to absorb traffic spikes without crashing downstream services. Implement the Competing Consumers pattern to scale processing with more instances. Use Managed Identity and RBAC to eliminate secrets from application code. Monitor DLQ depth, active message count, and throttled requests as the three most important health indicators. These patterns and practices, applied consistently, result in messaging systems that are resilient, scalable, secure, and easy to operate in production.
