Azure Monitor and Alerts

Deploying an application to Azure is only the beginning. Knowing whether it is healthy, fast, and performing correctly requires continuous monitoring. Azure Monitor is the central monitoring platform in Azure that collects, analyzes, and acts on telemetry data from Azure resources, applications, and on-premises environments.

What is Azure Monitor?

Azure Monitor collects two main types of data automatically from Azure resources:

Metrics: Numerical values measured at regular intervals (every minute by default). Examples: CPU percentage, memory usage, HTTP request count, disk read bytes.
Logs: Timestamped records of events and activities. Examples: VM startup events, failed login attempts, application error messages, resource creation and deletion events.

Azure Monitor Architecture

  Data Sources                  Azure Monitor                Actions
  ┌─────────────┐               ┌─────────────────────┐      ┌────────────────┐
  │ Azure VMs   │──────────────►│                     │─────►│  Alerts        │
  │ App Service │               │   Metrics Store     │      │  (email/SMS)   │
  │ SQL Database│──────────────►│   (Time-series DB)  │      └────────────────┘
  │ Kubernetes  │               │                     │      ┌────────────────┐
  │ On-premises │               │   Log Analytics     │─────►│  Dashboards    │
  │   servers   │──────────────►│   Workspace (Logs)  │      │  (Workbooks)   │
  │ Applications│               │                     │      └────────────────┘
  └─────────────┘               └─────────────────────┘      ┌────────────────┐
                                                        ────►│  Autoscale     │
                                                             └────────────────┘

Metrics in Azure Monitor

Metrics are lightweight numerical measurements stored in a time-series database for 93 days by default. They are collected automatically from most Azure resources at no extra cost.

Viewing Metrics – Metrics Explorer

The Metrics Explorer in the Azure Portal allows plotting any metric as a chart over a chosen time range. Multiple metrics can be overlaid on the same chart for comparison. Charts can be pinned to a dashboard for at-a-glance monitoring.

Common Metrics by Resource Type

Resource	Useful Metrics
Virtual Machine	CPU percentage, available memory bytes, disk read/write bytes, network in/out
App Service	Requests per second, response time, HTTP 5xx errors, CPU time
Azure SQL Database	DTU percentage, data space used, connection count, deadlock count
Storage Account	Total requests, ingress/egress bytes, availability percentage, latency
Azure Functions	Execution count, execution units, HTTP request failures

Logs and Log Analytics Workspace

Logs contain rich event data — much more detailed than metrics. They are stored in a Log Analytics Workspace, which is a centralized repository where logs from multiple Azure resources and even on-premises servers can be collected together.

Logs in the workspace are queried using Kusto Query Language (KQL) — a simple but powerful query language similar in concept to SQL.

Example KQL Queries

  // Find all VM heartbeats in the last hour
  Heartbeat
  | where TimeGenerated > ago(1h)
  | summarize count() by Computer

  // Find all HTTP 500 errors in the last 24 hours
  AppRequests
  | where ResultCode == 500
  | where TimeGenerated > ago(24h)
  | project TimeGenerated, Name, Url, DurationMs

  // Top 5 VMs by CPU usage in the last hour
  Perf
  | where ObjectName == "Processor" and CounterName == "% Processor Time"
  | where TimeGenerated > ago(1h)
  | summarize avg(CounterValue) by Computer
  | top 5 by avg_CounterValue desc

Azure Alerts

Alerts automatically notify the team when a monitored condition is met — for example, when CPU exceeds 90% or when an application throws more than 10 errors per minute. Alerts can also trigger automated remediation actions.

Alert Rule Components

Component	Description
Scope	The resource(s) being monitored (e.g., a specific VM or all VMs in a resource group)
Condition	The threshold or query that triggers the alert (e.g., CPU > 90% for 5 minutes)
Action Group	What happens when the alert fires — who gets notified and how
Alert Rule Name	A descriptive name shown in notifications and the alerts dashboard
Severity	0 (Critical) to 4 (Verbose) — helps prioritize response

Action Groups

An Action Group defines the list of notification and action targets when an alert fires. Multiple alerts can share the same action group.

Email / SMS / Voice call: Notify specific people or teams.
Push notification: Azure mobile app notification.
Azure Function: Trigger a serverless function for automated remediation (e.g., auto-restart a service).
Logic App: Trigger a workflow — for example, create an incident in ServiceNow or post a message in Microsoft Teams.
Runbook: Execute an Azure Automation runbook to automatically fix the issue.
Webhook: Send an HTTP POST to any external system (PagerDuty, Slack, etc.).
ITSM connector: Create tickets in ServiceNow, System Center Service Manager, etc.

Example Alert: High CPU on a VM

  Alert Rule: VM-HighCPU-Alert

  Scope:      myVM (Virtual Machine)
  Signal:     Metric — "Percentage CPU"
  Condition:  Average CPU > 90% over 5 minutes
  Severity:   2 (Warning)
  Action:     Action Group "OpsTeam"
              → Email: ops-team@company.com
              → SMS: +91-9876543210
              → Webhook: PagerDuty API

  When triggered:
  "ALERT FIRED: myVM CPU is at 94% (threshold: 90%)
   Time: 2024-01-15 14:32 UTC
   Severity: Warning"

Azure Dashboards and Workbooks

Azure Dashboards

Azure Dashboards are customizable views in the Azure Portal where metric charts, resource tiles, and log query results can be pinned for at-a-glance visibility. Different dashboards can be created for different teams — an operations dashboard, a security dashboard, a cost dashboard — and shared with team members.

Azure Monitor Workbooks

Workbooks are interactive report documents that combine text, queries, metrics, and parameters into rich visual reports. They are used to create operational runbooks, analysis reports, and guided investigation documents. Azure provides built-in workbook templates for common scenarios like VM performance analysis, network monitoring, and security auditing.

Application Insights

Application Insights is the application performance monitoring (APM) component of Azure Monitor. It monitors live applications — tracking request rates, response times, failure rates, dependency calls, and user behavior. A small SDK is added to the application code (or auto-instrumentation is used for supported runtimes), and Application Insights automatically collects telemetry.

Live Metrics: See request rates and failures in real time as they happen.
Failure Analysis: View the full stack trace of every application exception.
Performance Analysis: Identify the slowest database queries, API calls, and page loads.
Availability Tests: Ping the application from multiple locations globally every 5 minutes and alert if it goes down.
User Flow: Visualize the paths users take through the application and where they drop off.

Key Takeaways

Azure Monitor collects metrics (numbers over time) and logs (event records) from all Azure resources automatically.
Metrics Explorer allows visualizing and charting any metric over any time range.
Log Analytics Workspace centralizes logs; KQL is used to query and analyze them.
Alert rules trigger action groups when conditions are met — notifying teams via email, SMS, or automating fixes.
Application Insights provides deep application-level monitoring including error tracking, performance analysis, and availability testing.
Azure Dashboards and Workbooks turn monitoring data into shared visual reports for teams.

Previous lessons

Back to courses

Next lessons