Service Bus Monitoring and Diagnostics
Monitoring Azure Service Bus is essential for detecting performance issues, catching message processing failures, identifying growing backlogs, and ensuring SLAs are met. Azure provides a built-in monitoring stack — Azure Monitor Metrics, Diagnostic Logs, Alerts, and Application Insights integration — that covers every layer of Service Bus observability without requiring third-party tools.
Monitoring Layers Overview
+----------------------------------------------------------+ | Azure Service Bus Monitoring Stack | | | | Layer 1: Metrics (numeric time-series data) | | - Active message count, DLQ count, throughput, latency | | | | Layer 2: Diagnostic Logs (detailed operational events) | | - Per-operation audit logs, errors, management events | | | | Layer 3: Alerts (notify when thresholds breach) | | - Email, SMS, PagerDuty, webhook, Logic App trigger | | | | Layer 4: Dashboards (visual real-time overview) | | - Azure Dashboard + Workbooks | +----------------------------------------------------------+
Key Metrics to Monitor
| Metric Name | Description | Alert Condition |
|---|---|---|
| Active Messages | Messages waiting in queue / subscription | Growing continuously = consumer is slow or down |
| Dead-Lettered Messages | Messages in the Dead Letter Queue | Any increase = processing failures occurring |
| Scheduled Messages | Messages waiting for their scheduled time | Spike = bulk scheduling event |
| Incoming Messages | Messages sent per second/minute | Sudden drop = publisher is down |
| Outgoing Messages | Messages received (consumed) per second | Sudden drop = consumer is down |
| Server Errors | Errors from Service Bus service itself | Any non-zero = platform issue |
| User Errors | Errors from client operations (bad requests) | High rate = app code bugs |
| Throttled Requests | Requests rejected due to quota limits | Any value = namespace hitting throughput limit |
| Namespace Size Used % | Percentage of namespace storage used | Alert at 70%, critical at 90% |
| Connection Count | Active AMQP connections to namespace | Unexpected drop = connectivity issue |
Viewing Metrics in the Azure Portal
- Open the Service Bus Namespace in the Azure Portal
- Click Metrics under the Monitoring section in the left menu
- Select a metric from the dropdown (e.g., Active Messages)
- Set the time range (last 1 hour, 24 hours, 7 days)
- Add filters to scope to a specific queue or topic
- Split by entity name to compare multiple queues on the same chart
Viewing Metrics via CLI
# Get active message count for a queue (last 1 hour, 1-minute granularity) az monitor metrics list --resource /subscriptions/<sub-id>/resourceGroups/rg-messaging-prod/providers/Microsoft.ServiceBus/namespaces/myshopns --metric "ActiveMessages" --interval PT1M --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) --output table
Get Queue Runtime Properties via SDK
using Azure.Messaging.ServiceBus.Administration;
var admin = new ServiceBusAdministrationClient(connectionString);
var props = await admin.GetQueueRuntimePropertiesAsync("orders");
Console.WriteLine($"Active messages : {props.Value.ActiveMessageCount}");
Console.WriteLine($"Dead-letter msgs : {props.Value.DeadLetterMessageCount}");
Console.WriteLine($"Scheduled messages : {props.Value.ScheduledMessageCount}");
Console.WriteLine($"Transfer DLQ msgs : {props.Value.TransferDeadLetterMessageCount}");
Console.WriteLine($"Total message count: {props.Value.TotalMessageCount}");
Console.WriteLine($"Size in bytes : {props.Value.SizeInBytes}");
Console.WriteLine($"Created at : {props.Value.CreatedAt}");
Console.WriteLine($"Modified at : {props.Value.ModifiedAt}");
Console.WriteLine($"Accessed at : {props.Value.AccessedAt}");
Enabling Diagnostic Logs
Diagnostic logs capture every operation — send, receive, complete, abandon, dead-letter — at the entity level. They are sent to Log Analytics, Azure Storage, or an Event Hub.
Enable via CLI
# Get Log Analytics Workspace ID
WORKSPACE_ID=$(az monitor log-analytics workspace show --resource-group rg-monitoring --workspace-name my-logs-workspace --query id --output tsv)
# Enable diagnostic logs on the namespace
az monitor diagnostic-settings create --name "servicebus-diagnostics" --resource /subscriptions/<sub-id>/resourceGroups/rg-messaging-prod/providers/Microsoft.ServiceBus/namespaces/myshopns --workspace $WORKSPACE_ID --logs '[
{"category": "OperationalLogs", "enabled": true},
{"category": "VNetAndIPFilteringLogs", "enabled": true},
{"category": "RuntimeAuditLogs", "enabled": true}
]' --metrics '[{"category": "AllMetrics", "enabled": true}]'
Log Categories
| Log Category | Contains |
|---|---|
| OperationalLogs | Queue/topic create, update, delete operations |
| RuntimeAuditLogs | Send, receive, complete, abandon events per message |
| VNetAndIPFilteringLogs | Rejected connections from unauthorized IPs or VNets |
Querying Logs in Log Analytics
// KQL Query: Dead-lettered messages in the last 24 hours AzureDiagnostics | where ResourceType == "NAMESPACES" | where Category == "OperationalLogs" | where OperationName == "DeadLetterMessages" | where TimeGenerated > ago(24h) | project TimeGenerated, EntityName_s, MessageId_s, DeadLetterReason_s | order by TimeGenerated desc
// KQL Query: Error rate by operation type (last 1 hour) AzureDiagnostics | where ResourceType == "NAMESPACES" | where TimeGenerated > ago(1h) | where ResultType == "Failed" | summarize ErrorCount = count() by OperationName, bin(TimeGenerated, 5m) | render timechart
// KQL Query: Top queues by message volume
AzureDiagnostics
| where ResourceType == "NAMESPACES"
| where TimeGenerated > ago(24h)
| where OperationName in ("Send", "Receive")
| summarize TotalOps = count() by EntityName_s, OperationName
| order by TotalOps desc
Setting Up Alerts
Alert: Dead Letter Queue Growing
az monitor alert create --name "ServiceBus-DLQ-Alert" --resource-group rg-messaging-prod --target /subscriptions/<sub-id>/.../namespaces/myshopns --condition "avg DeadLetteredMessageCount > 10" --description "DLQ message count exceeds 10 — investigate consumer failures" --action-group /subscriptions/<sub-id>/.../actionGroups/ops-alerts
Alert: Active Message Count Spike (Queue Backlog)
Metric : Active Messages Condition : Greater than 5000 Time window: 5 minutes Severity : 2 (Warning) Action : Email ops-team@company.com Interpretation: Queue backlog growing = consumers are not keeping up with incoming messages. Action: Scale out consumer instances or investigate consumer slowness.
Alert: Throttled Requests
Metric : Throttled Requests Condition : Greater than 0 Time window: 1 minute Severity : 1 (Critical) Action : PagerDuty webhook Interpretation: Namespace hitting throughput limits. Action (Premium): Scale up Messaging Units. Action (Standard): Reduce send/receive rate or upgrade to Premium.
Monitoring DLQ Programmatically
// Poll DLQ count every 60 seconds and alert when it grows
async Task MonitorDlq(ServiceBusAdministrationClient admin, string queueName)
{
int previousDlqCount = 0;
while (true)
{
var props = await admin.GetQueueRuntimePropertiesAsync(queueName);
int currentDlqCount = (int)props.Value.DeadLetterMessageCount;
if (currentDlqCount > previousDlqCount)
{
Console.WriteLine($"[ALERT] DLQ growing: {previousDlqCount} -> {currentDlqCount}");
// Trigger notification — send alert email, call webhook, etc.
}
previousDlqCount = currentDlqCount;
await Task.Delay(TimeSpan.FromSeconds(60));
}
}
Azure Monitor Workbook — Service Bus Dashboard
Azure Monitor Workbooks provide a custom visual dashboard for Service Bus. Create one by combining multiple metric charts and log queries on a single page. Useful charts to include:
- Active Messages per queue (line chart — detect backlogs)
- Dead-Lettered Messages per queue (bar chart — detect failures)
- Incoming vs Outgoing Messages (area chart — detect imbalance)
- Server Errors and User Errors (line chart — detect reliability issues)
- Throttled Requests over time (line chart — detect capacity issues)
- Namespace size used % (gauge — detect approaching capacity)
Integration with Application Insights
When using the .NET SDK, Application Insights automatically traces Service Bus operations if the Azure.Messaging.ServiceBus package and Application Insights SDK are both configured. Each send, receive, and complete operation appears as a dependency in Application Insights distributed traces.
// Install Application Insights SDK // dotnet add package Microsoft.ApplicationInsights.AspNetCore // In Program.cs builder.Services.AddApplicationInsightsTelemetry(); // Service Bus operations automatically appear in: // Application Insights --> Investigate --> Dependencies // Showing: send time, receive time, duration, success/failure
Monitoring Best Practices
- Alert on DLQ count > 0 for mission-critical queues — every DLQ message is a failed business event
- Alert on Active Messages growing over 15+ minutes — indicates consumer lag or failure
- Send diagnostic logs to Log Analytics with a retention period matching compliance requirements (30–365 days)
- Create a Service Bus Workbook visible to the entire operations team
- Monitor Throttled Requests — this is the first sign of capacity exhaustion
- Review Namespace Size Used % weekly and plan capacity before it reaches 80%
- Use Application Insights distributed tracing to correlate a single user action across all Service Bus operations and downstream services
Summary
Azure Service Bus monitoring uses Azure Monitor Metrics for real-time numeric data, Diagnostic Logs for per-operation detail, Alerts for proactive notification, and Workbooks for visual dashboards. The most important metrics to track are active message count (for backlog detection), dead-lettered message count (for failure detection), and throttled requests (for capacity planning). SDK runtime properties give programmatic access to queue depths at any time. Application Insights provides distributed tracing that connects Service Bus operations to the broader application request flow. A well-monitored Service Bus namespace gives full visibility into message health, consumer performance, and system reliability.
