Service Bus Monitoring and Diagnostics

Monitoring Azure Service Bus is essential for detecting performance issues, catching message processing failures, identifying growing backlogs, and ensuring SLAs are met. Azure provides a built-in monitoring stack — Azure Monitor Metrics, Diagnostic Logs, Alerts, and Application Insights integration — that covers every layer of Service Bus observability without requiring third-party tools.

Monitoring Layers Overview

+----------------------------------------------------------+
| Azure Service Bus Monitoring Stack                       |
|                                                          |
| Layer 1: Metrics         (numeric time-series data)      |
|   - Active message count, DLQ count, throughput, latency |
|                                                          |
| Layer 2: Diagnostic Logs (detailed operational events)   |
|   - Per-operation audit logs, errors, management events  |
|                                                          |
| Layer 3: Alerts          (notify when thresholds breach) |
|   - Email, SMS, PagerDuty, webhook, Logic App trigger    |
|                                                          |
| Layer 4: Dashboards      (visual real-time overview)     |
|   - Azure Dashboard + Workbooks                          |
+----------------------------------------------------------+

Key Metrics to Monitor

Metric NameDescriptionAlert Condition
Active MessagesMessages waiting in queue / subscriptionGrowing continuously = consumer is slow or down
Dead-Lettered MessagesMessages in the Dead Letter QueueAny increase = processing failures occurring
Scheduled MessagesMessages waiting for their scheduled timeSpike = bulk scheduling event
Incoming MessagesMessages sent per second/minuteSudden drop = publisher is down
Outgoing MessagesMessages received (consumed) per secondSudden drop = consumer is down
Server ErrorsErrors from Service Bus service itselfAny non-zero = platform issue
User ErrorsErrors from client operations (bad requests)High rate = app code bugs
Throttled RequestsRequests rejected due to quota limitsAny value = namespace hitting throughput limit
Namespace Size Used %Percentage of namespace storage usedAlert at 70%, critical at 90%
Connection CountActive AMQP connections to namespaceUnexpected drop = connectivity issue

Viewing Metrics in the Azure Portal

  1. Open the Service Bus Namespace in the Azure Portal
  2. Click Metrics under the Monitoring section in the left menu
  3. Select a metric from the dropdown (e.g., Active Messages)
  4. Set the time range (last 1 hour, 24 hours, 7 days)
  5. Add filters to scope to a specific queue or topic
  6. Split by entity name to compare multiple queues on the same chart

Viewing Metrics via CLI

# Get active message count for a queue (last 1 hour, 1-minute granularity)
az monitor metrics list   --resource /subscriptions/<sub-id>/resourceGroups/rg-messaging-prod/providers/Microsoft.ServiceBus/namespaces/myshopns   --metric "ActiveMessages"   --interval PT1M   --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)   --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ)   --output table

Get Queue Runtime Properties via SDK

using Azure.Messaging.ServiceBus.Administration;

var admin = new ServiceBusAdministrationClient(connectionString);

var props = await admin.GetQueueRuntimePropertiesAsync("orders");

Console.WriteLine($"Active messages    : {props.Value.ActiveMessageCount}");
Console.WriteLine($"Dead-letter msgs   : {props.Value.DeadLetterMessageCount}");
Console.WriteLine($"Scheduled messages : {props.Value.ScheduledMessageCount}");
Console.WriteLine($"Transfer DLQ msgs  : {props.Value.TransferDeadLetterMessageCount}");
Console.WriteLine($"Total message count: {props.Value.TotalMessageCount}");
Console.WriteLine($"Size in bytes      : {props.Value.SizeInBytes}");
Console.WriteLine($"Created at         : {props.Value.CreatedAt}");
Console.WriteLine($"Modified at        : {props.Value.ModifiedAt}");
Console.WriteLine($"Accessed at        : {props.Value.AccessedAt}");

Enabling Diagnostic Logs

Diagnostic logs capture every operation — send, receive, complete, abandon, dead-letter — at the entity level. They are sent to Log Analytics, Azure Storage, or an Event Hub.

Enable via CLI

# Get Log Analytics Workspace ID
WORKSPACE_ID=$(az monitor log-analytics workspace show   --resource-group rg-monitoring   --workspace-name my-logs-workspace   --query id --output tsv)

# Enable diagnostic logs on the namespace
az monitor diagnostic-settings create   --name "servicebus-diagnostics"   --resource /subscriptions/<sub-id>/resourceGroups/rg-messaging-prod/providers/Microsoft.ServiceBus/namespaces/myshopns   --workspace $WORKSPACE_ID   --logs '[
    {"category": "OperationalLogs",        "enabled": true},
    {"category": "VNetAndIPFilteringLogs",  "enabled": true},
    {"category": "RuntimeAuditLogs",        "enabled": true}
  ]'   --metrics '[{"category": "AllMetrics", "enabled": true}]'

Log Categories

Log CategoryContains
OperationalLogsQueue/topic create, update, delete operations
RuntimeAuditLogsSend, receive, complete, abandon events per message
VNetAndIPFilteringLogsRejected connections from unauthorized IPs or VNets

Querying Logs in Log Analytics

// KQL Query: Dead-lettered messages in the last 24 hours
AzureDiagnostics
| where ResourceType == "NAMESPACES"
| where Category == "OperationalLogs"
| where OperationName == "DeadLetterMessages"
| where TimeGenerated > ago(24h)
| project TimeGenerated, EntityName_s, MessageId_s, DeadLetterReason_s
| order by TimeGenerated desc
// KQL Query: Error rate by operation type (last 1 hour)
AzureDiagnostics
| where ResourceType == "NAMESPACES"
| where TimeGenerated > ago(1h)
| where ResultType == "Failed"
| summarize ErrorCount = count() by OperationName, bin(TimeGenerated, 5m)
| render timechart
// KQL Query: Top queues by message volume
AzureDiagnostics
| where ResourceType == "NAMESPACES"
| where TimeGenerated > ago(24h)
| where OperationName in ("Send", "Receive")
| summarize TotalOps = count() by EntityName_s, OperationName
| order by TotalOps desc

Setting Up Alerts

Alert: Dead Letter Queue Growing

az monitor alert create   --name "ServiceBus-DLQ-Alert"   --resource-group rg-messaging-prod   --target /subscriptions/<sub-id>/.../namespaces/myshopns   --condition "avg DeadLetteredMessageCount > 10"   --description "DLQ message count exceeds 10 — investigate consumer failures"   --action-group /subscriptions/<sub-id>/.../actionGroups/ops-alerts

Alert: Active Message Count Spike (Queue Backlog)

Metric    : Active Messages
Condition : Greater than 5000
Time window: 5 minutes
Severity  : 2 (Warning)
Action    : Email ops-team@company.com

Interpretation:
  Queue backlog growing = consumers are not keeping up with incoming messages.
  Action: Scale out consumer instances or investigate consumer slowness.

Alert: Throttled Requests

Metric    : Throttled Requests
Condition : Greater than 0
Time window: 1 minute
Severity  : 1 (Critical)
Action    : PagerDuty webhook

Interpretation:
  Namespace hitting throughput limits.
  Action (Premium): Scale up Messaging Units.
  Action (Standard): Reduce send/receive rate or upgrade to Premium.

Monitoring DLQ Programmatically

// Poll DLQ count every 60 seconds and alert when it grows
async Task MonitorDlq(ServiceBusAdministrationClient admin, string queueName)
{
    int previousDlqCount = 0;

    while (true)
    {
        var props = await admin.GetQueueRuntimePropertiesAsync(queueName);
        int currentDlqCount = (int)props.Value.DeadLetterMessageCount;

        if (currentDlqCount > previousDlqCount)
        {
            Console.WriteLine($"[ALERT] DLQ growing: {previousDlqCount} -> {currentDlqCount}");
            // Trigger notification — send alert email, call webhook, etc.
        }

        previousDlqCount = currentDlqCount;
        await Task.Delay(TimeSpan.FromSeconds(60));
    }
}

Azure Monitor Workbook — Service Bus Dashboard

Azure Monitor Workbooks provide a custom visual dashboard for Service Bus. Create one by combining multiple metric charts and log queries on a single page. Useful charts to include:

  • Active Messages per queue (line chart — detect backlogs)
  • Dead-Lettered Messages per queue (bar chart — detect failures)
  • Incoming vs Outgoing Messages (area chart — detect imbalance)
  • Server Errors and User Errors (line chart — detect reliability issues)
  • Throttled Requests over time (line chart — detect capacity issues)
  • Namespace size used % (gauge — detect approaching capacity)

Integration with Application Insights

When using the .NET SDK, Application Insights automatically traces Service Bus operations if the Azure.Messaging.ServiceBus package and Application Insights SDK are both configured. Each send, receive, and complete operation appears as a dependency in Application Insights distributed traces.

// Install Application Insights SDK
// dotnet add package Microsoft.ApplicationInsights.AspNetCore

// In Program.cs
builder.Services.AddApplicationInsightsTelemetry();

// Service Bus operations automatically appear in:
// Application Insights --> Investigate --> Dependencies
// Showing: send time, receive time, duration, success/failure

Monitoring Best Practices

  • Alert on DLQ count > 0 for mission-critical queues — every DLQ message is a failed business event
  • Alert on Active Messages growing over 15+ minutes — indicates consumer lag or failure
  • Send diagnostic logs to Log Analytics with a retention period matching compliance requirements (30–365 days)
  • Create a Service Bus Workbook visible to the entire operations team
  • Monitor Throttled Requests — this is the first sign of capacity exhaustion
  • Review Namespace Size Used % weekly and plan capacity before it reaches 80%
  • Use Application Insights distributed tracing to correlate a single user action across all Service Bus operations and downstream services

Summary

Azure Service Bus monitoring uses Azure Monitor Metrics for real-time numeric data, Diagnostic Logs for per-operation detail, Alerts for proactive notification, and Workbooks for visual dashboards. The most important metrics to track are active message count (for backlog detection), dead-lettered message count (for failure detection), and throttled requests (for capacity planning). SDK runtime properties give programmatic access to queue depths at any time. Application Insights provides distributed tracing that connects Service Bus operations to the broader application request flow. A well-monitored Service Bus namespace gives full visibility into message health, consumer performance, and system reliability.

Leave a Comment