Grafana Loki Log Monitoring

Loki is a log aggregation system built by Grafana Labs. It collects log lines from applications and servers, stores them efficiently, and lets you query them with a simple language called LogQL. Grafana and Loki work as a pair — Grafana visualises metrics from Prometheus and log lines from Loki, all on the same dashboard.

The Post-it Note Analogy

Imagine every event your application creates writes a sticky note. Without Loki, notes pile up on each server in separate files. Finding the note about an error means walking to each server and searching through stacks. Loki collects all sticky notes to one central board, labels each one by source, and lets you search across all notes in seconds from one place — Grafana.

How Loki Works

Applications / Servers
┌──────────┐  ┌──────────┐  ┌──────────┐
│ App A    │  │ App B    │  │ Server   │
│ /app/app │  │ /var/log │  │ /var/log │
└────┬─────┘  └─────┬────┘  └──────┬───┘
     │              │              │
     └──────────────┴──────────────┘
                    │
              Promtail / Alloy
              (log shipping agent)
                    │
                    ▼
               Loki (stores logs)
                    │
                    ▼
               Grafana (displays logs)

Promtail / Grafana Alloy

Promtail is the log shipping agent that runs on each server. It watches log files, reads new lines, adds labels (such as the server hostname and application name), and pushes the log lines to Loki. Grafana Alloy is the modern replacement for Promtail that handles logs, metrics, and traces in one agent.

Labels in Loki

Loki uses labels to identify log streams — exactly like Prometheus uses labels for metrics. A log stream is a unique combination of labels. For example:

{job="nginx", instance="server-01", env="production"}
{job="app-api", instance="server-02", env="staging"}

Each combination is a separate stream with its own storage in Loki. Keep the number of unique label combinations low — high cardinality labels (like user IDs) create too many streams and degrade Loki performance.

LogQL – Loki Query Language

LogQL is Loki's query language. It has two types of expressions: log queries (return raw log lines) and metric queries (calculate numbers from log lines).

Basic Log Query – Filter by Label

{job="nginx"}

This returns all log lines from the nginx job. In Grafana's Explore view or a Logs panel, you see a live stream of nginx log lines.

Log Pipeline – Filter by Content

Add a pipe (|) after the label selector to filter the log content.

{job="nginx"} |= "error"

Returns only lines that contain the word "error".

{job="nginx"} != "health_check"

Returns all nginx lines except those containing "health_check" — useful for filtering out noisy health probe logs.

{job="nginx"} |= "error" |~ "5[0-9]{2}"

Returns lines containing "error" that also match the regex pattern for 5xx status codes.

Line Format

Use | line_format to reformat each log line before displaying it. This simplifies verbose log formats.

{job="app-api"} | line_format "{{.level}} — {{.message}}"

JSON Log Parsing

When logs are in JSON format, extract individual fields with the | json parser. The extracted fields become queryable labels.

{job="app-api"} | json | level="error"

This parses each log line as JSON, extracts the level field, and filters to only error-level logs.

Logfmt Parsing

Logfmt is a common structured log format that uses key=value pairs. Parse it with | logfmt:

{job="app-api"} | logfmt | duration > 1s

Returns only log lines where the duration field exceeds 1 second — a fast way to find slow requests.

Metric Queries – Count Log Lines

Wrap a log query in a metric function to calculate numbers over time instead of returning raw lines.

count_over_time({job="nginx"} |= "error" [5m])

Returns the number of error log lines per 5-minute window. Plot this in a time-series panel to see error rate trends.

rate({job="nginx"} |= "error" [5m])

Returns errors per second as a rate. Comparable to Prometheus rate() but applied to log counts.

Bytes Rate

bytes_rate({job="nginx"}[5m])

Returns the log volume in bytes per second. Useful for detecting log storms — sudden spikes in log output that may indicate application errors.

Adding Loki as a Data Source

Administration → Data Sources → Add data source → Loki
URL: http://localhost:3100
Save & Test

Using the Logs Panel

Add a Logs panel to a dashboard and select Loki as the data source. Enter a LogQL query. The panel displays a scrollable list of log lines, colour-coded by log level (DEBUG, INFO, WARN, ERROR). New lines appear at the top when you enable live tailing.

Log Level Detection

Grafana automatically detects log levels and applies colour coding:

DEBUG → grey
INFO  → blue
WARN  → yellow
ERROR → red
CRIT  → purple

Correlating Logs and Metrics on One Dashboard

The most powerful Grafana + Loki use case is placing a Logs panel next to a Time Series panel on the same dashboard. When you spot a CPU spike on the time series chart, you look at the logs panel below it — filtered to the same time window — to see the exact error message that caused the spike.

Dashboard layout:
┌─────────────────────────────────────────┐
│  CPU Usage (Prometheus — Time Series)   │
│         ___spike here___                │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│  Application Logs (Loki — Logs panel)   │
│  13:05:32 ERROR  OutOfMemoryError: ...  │
│  13:05:31 WARN   GC overhead limit ...  │
│  13:05:30 INFO   Request received ...   │
└─────────────────────────────────────────┘

Derived Fields – Linking Logs to Traces

Configure Derived Fields in the Loki data source settings to make trace IDs in log lines clickable. When a log line contains a trace ID, Grafana renders it as a hyperlink that opens the corresponding trace in Grafana Tempo. This connects logs directly to the request trace — the full story of a slow or failed request across all microservices.

Previous lessons

Back to courses

Next lessons