Redis Monitoring and Debugging

Running Redis in production without monitoring is like driving a car without a dashboard. You cannot see speed, fuel, or engine warnings until something breaks. Redis provides built-in commands and a set of metrics that tell you exactly what is happening inside the server at any moment.

The Control Room Analogy

A power plant has a control room full of gauges, lights, and alarms. Each gauge measures something different: temperature, load, voltage. Operators watch these gauges and act before a number goes out of range. Redis INFO, MONITOR, and SLOWLOG are your gauges, lights, and alarms.

Redis Health Monitoring Overview

  ┌──────────────────────────────────────────────────────────┐
  │                      Redis Server                        │
  │                                                          │
  │  Memory ──▶ used_memory, maxmemory                       │
  │  Clients ──▶ connected_clients, blocked_clients          │
  │  Performance ──▶ ops_per_sec, hit_rate                   │
  │  Persistence ──▶ rdb_last_save, aof_enabled              │
  │  Replication ──▶ role, connected_slaves, lag             │
  │  Errors ──▶ rejected_connections, evicted_keys           │
  │                                                          │
  └──────────────────────────────────────────────────────────┘
         │
         ▼
  redis-cli INFO  ← single command to see all of these

The INFO Command

INFO is the most important monitoring command. It returns statistics grouped into sections. You can request the full report or a specific section.

127.0.0.1:6379> INFO server        ← Redis version, OS, config file path
127.0.0.1:6379> INFO memory        ← memory usage details
127.0.0.1:6379> INFO clients       ← connected client counts
127.0.0.1:6379> INFO stats         ← commands processed, hits, misses
127.0.0.1:6379> INFO replication   ← primary/replica status
127.0.0.1:6379> INFO persistence   ← RDB/AOF state
127.0.0.1:6379> INFO all           ← everything at once

Key Metrics to Watch in INFO memory

used_memory:           104857600   ← bytes Redis is using (100 MB)
used_memory_human:     100.00M
maxmemory:             268435456   ← limit set in config (256 MB)
maxmemory_human:       256.00M
mem_fragmentation_ratio: 1.2       ← ideal range: 1.0–1.5

If mem_fragmentation_ratio > 2.0: Redis is wasting memory.
Run MEMORY PURGE or restart to defragment.

Key Metrics to Watch in INFO stats

keyspace_hits:    9820000    ← requests served from cache
keyspace_misses:  180000     ← requests not found in cache

Cache hit rate = hits / (hits + misses) × 100
             = 9820000 / (9820000 + 180000) × 100
             = 98.2%

A healthy cache hit rate for most apps: above 90%.
If hit rate drops, your cache keys may be expiring too fast,
or new keys are displacing useful ones via eviction.

MONITOR – Watch Live Commands

MONITOR streams every command processed by Redis in real time to your terminal. Use it briefly during debugging. Never leave it running in production — it doubles the work Redis does and slows the server.

127.0.0.1:6379> MONITOR

1710000001.234 [0 127.0.0.1:52100] "SET" "user:1001" "Alice"
1710000001.235 [0 127.0.0.1:52100] "GET" "session:abc"
1710000001.240 [0 127.0.0.1:52101] "INCR" "pageviews"

Each line shows:
  timestamp  [db source_ip:port]  command  arguments

SLOWLOG – Find Slow Commands

The slow log records every command that takes longer than a configurable threshold. Commands appearing here are candidates for optimization.

Configure the threshold in redis.conf:
  slowlog-log-slower-than 10000   ← log commands taking > 10ms (10000 microseconds)
  slowlog-max-len 128             ← keep the last 128 slow commands

View the slow log:
  SLOWLOG GET 10   ← show the 10 most recent slow commands

Output:
  1) 1) (integer) 42          ← log entry ID
     2) (integer) 1710000050  ← Unix timestamp when it ran
     3) (integer) 18234       ← duration in microseconds (18ms!)
     4) 1) "KEYS"             ← the command that was slow
        2) "*"
     5) "127.0.0.1:52201"
     6) ""

  → KEYS * ran in 18ms on a large keyspace. Replace it with SCAN.

Clear the slow log:
  SLOWLOG RESET

SCAN – Safe Key Iteration (Replace KEYS)

KEYS * blocks Redis until it scans every key. On a dataset with millions of keys, this freezes the server for seconds. SCAN returns a cursor and a small batch of keys per call without blocking.

Iterating all keys with SCAN (safe):

  cursor = 0
  loop:
    result = SCAN cursor COUNT 100
    cursor = result[0]      ← new cursor for next call
    keys   = result[1]      ← batch of key names

    process keys...

    if cursor == "0": stop  ← full scan completed

Example in redis-cli:
  SCAN 0 COUNT 100
  1) "128"             ← next cursor
  2) 1) "user:1001"
     2) "session:abc"
     3) "product:55"
     ...

  SCAN 128 COUNT 100
  1) "0"               ← cursor 0 = scan complete
  2) 1) "leaderboard"
     ...

CLIENT LIST – See All Connected Clients

127.0.0.1:6379> CLIENT LIST

id=5 addr=127.0.0.1:52100 fd=9 name= age=10 idle=0 flags=N
  db=0 sub=0 psub=0 multi=-1 qbuf=0 obl=0 oll=0 omem=0 cmd=client

Fields to watch:
  idle   → seconds since last command (high idle = stale connection)
  sub    → subscribed channels (0 for normal clients)
  multi  → inside a MULTI block (-1 = not in transaction)
  cmd    → last command this client ran

DEBUG JMAP and MEMORY USAGE

Check memory used by a single key:
  MEMORY USAGE user:1001
  → (integer) 128   ← 128 bytes including Redis overhead

Find all large keys with SCAN + MEMORY USAGE in a loop,
or use the redis-cli built-in tool:
  redis-cli --memkeys   ← scans and reports top memory-consuming keys

Common Production Alerts to Set Up

┌──────────────────────────────────────┬─────────────────────────┐
│  Metric                              │  Alert Threshold        │
├──────────────────────────────────────┼─────────────────────────┤
│  used_memory > 80% of maxmemory      │  Warning at 80%         │
│  Cache hit rate < 90%                │  Investigate            │
│  connected_clients > normal peak     │  Connection leak?       │
│  rejected_connections > 0            │  maxclients reached     │
│  replication lag > 10 seconds        │  Replica falling behind │
│  aof_last_bgrewrite_status = err     │  AOF rewrite failed     │
└──────────────────────────────────────┴─────────────────────────┘

Key Points

INFO gives a full health report. Use INFO memory, INFO stats, and INFO replication most often in production.
Track your cache hit rate from INFO stats. A rate above 90% is healthy.
MONITOR streams live commands — useful for short debugging sessions. Remove from production when done.
SLOWLOG reveals commands that exceed your latency threshold. Replace KEYS * with SCAN to fix the most common slowdown.
Use MEMORY USAGE to find keys consuming disproportionate memory.
Set alerts on used_memory, hit rate, replication lag, and rejected connections before problems become outages.

Previous lesson

Back to course