PostgreSQL Backup Restore and Performance Tuning

Two of the most critical responsibilities in managing a PostgreSQL database are ensuring data can be recovered after failure (backup and restore) and ensuring the database responds to queries quickly (performance tuning). This final topic brings together the operational skills needed to run PostgreSQL reliably and efficiently in a real-world environment.

Part 1: Backup and Restore

Types of Backups

Type	Description	Tool
Logical Backup	Exports SQL statements that recreate the database	pg_dump, pg_dumpall
Physical Backup	Copies raw data files from disk	pg_basebackup, filesystem copy
Continuous Archiving (PITR)	Ongoing WAL archiving — restore to any point in time	WAL archiving + pg_basebackup

pg_dump — Backup a Single Database

pg_dump exports one database as a SQL script or archive. It is run from the operating system shell, not inside psql.

-- Plain SQL format (human-readable)
pg_dump -U postgres -d company_db -f company_db_backup.sql

-- Custom archive format (compressed, best for large databases)
pg_dump -U postgres -d company_db -F c -f company_db.dump

-- Table of contents: shows what is in an archive
pg_restore --list company_db.dump

pg_dump Options

Option	Description
`-U username`	Connect as this user
`-d dbname`	Database to dump
`-f filename`	Output file name
`-F c`	Custom format (compressed)
`-F p`	Plain SQL format (default)
`-t tablename`	Dump only a specific table
`-s`	Schema only (no data)
`-a`	Data only (no schema)
`-j n`	Parallel dump using n jobs (custom/directory format only)

pg_dumpall — Backup All Databases

pg_dumpall -U postgres -f all_databases_backup.sql

pg_dumpall dumps every database in the PostgreSQL cluster, plus global objects such as roles and tablespaces. Only plain SQL format is supported.

Restoring with psql (Plain SQL Format)

-- Create the target database first
createdb -U postgres company_db_restored

-- Restore from SQL file
psql -U postgres -d company_db_restored -f company_db_backup.sql

Restoring with pg_restore (Custom/Archive Format)

-- Create the target database
createdb -U postgres company_db_restored

-- Restore from custom dump
pg_restore -U postgres -d company_db_restored company_db.dump

-- Restore only a specific table
pg_restore -U postgres -d company_db_restored -t orders company_db.dump

-- Parallel restore (faster on multi-core systems)
pg_restore -U postgres -d company_db_restored -j 4 company_db.dump

Physical Backup with pg_basebackup

pg_basebackup copies the entire PostgreSQL data directory. It is used as the foundation for streaming replication and Point-In-Time Recovery (PITR).

pg_basebackup -U postgres -D /var/backups/pgbase -Ft -z -P

-D: destination directory
-Ft: tar format
-z: gzip compression
-P: show progress

Backup Best Practices

Schedule automated backups using cron (Linux/macOS) or Task Scheduler (Windows).
Store backups on a different physical machine or cloud storage — never only on the database server.
Test restores regularly — an untested backup is not a real backup.
Retain multiple backup generations (daily, weekly, monthly).
Use custom format (-F c) for large databases; it is smaller and supports parallel restore.

Part 2: Performance Tuning

Understanding EXPLAIN and EXPLAIN ANALYZE

EXPLAIN shows the query execution plan — how PostgreSQL intends to retrieve the data. EXPLAIN ANALYZE actually runs the query and shows real timing data alongside the plan.

-- Show the plan without running
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- Run the query and show actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Reading the Query Plan

Key terms in an EXPLAIN output:

Term	Meaning
Seq Scan	Full table scan — reads every row. Slow on large tables without indexes.
Index Scan	Uses an index to locate rows. Fast for selective queries.
Bitmap Heap Scan	Combines index results with a heap scan. Used for moderate selectivity.
Hash Join	Joins two tables using a hash table. Efficient for large unsorted datasets.
Nested Loop	Joins by looping over the inner table for each outer row. Fast for small tables.
cost=X..Y	Estimated startup cost..total cost. Lower is better.
rows=N	Estimated number of rows returned.
actual time	Real elapsed time in milliseconds (EXPLAIN ANALYZE only).

Indexing for Performance

Missing indexes are the most common performance problem. If EXPLAIN shows a Seq Scan on a large table when filtering by a column, an index on that column will often solve the problem.

-- Slow query (Seq Scan)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Add the index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- Now uses Index Scan
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

VACUUM and ANALYZE

PostgreSQL uses Multi-Version Concurrency Control (MVCC): deleted and updated rows are not immediately removed. Over time, these dead rows accumulate and slow down queries. VACUUM reclaims that space.

-- Remove dead rows (does not shrink the file)
VACUUM orders;

-- Remove dead rows and reclaim disk space (locks the table)
VACUUM FULL orders;

-- Update query planner statistics
ANALYZE orders;

-- Do both at once
VACUUM ANALYZE orders;

PostgreSQL runs autovacuum automatically in the background. Manual VACUUM is needed after large bulk operations (like inserting or deleting millions of rows at once).

Table Statistics and the Query Planner

The query planner uses statistics about column value distributions to estimate row counts and choose the best plan. Running ANALYZE keeps these statistics current.

-- View planner statistics for a table
SELECT attname, n_distinct, correlation
FROM pg_stats
WHERE tablename = 'orders';

Key postgresql.conf Settings for Performance

The postgresql.conf file controls global server behavior. These are the most impactful settings for performance:

Setting	Description	Recommended Starting Point
`shared_buffers`	Memory used for caching data pages	25% of total RAM
`work_mem`	Memory per sort/hash operation per query	4–16 MB (increase for complex sorts)
`maintenance_work_mem`	Memory for VACUUM, CREATE INDEX, etc.	256 MB or more
`effective_cache_size`	Estimate of OS + PostgreSQL cache (planner hint)	50–75% of total RAM
`max_connections`	Maximum simultaneous connections	100–200 (use a connection pooler for more)
`checkpoint_completion_target`	Smooths out checkpoint I/O writes	0.9
`random_page_cost`	Planner's cost estimate for random disk access	1.1 for SSDs, 4.0 for spinning disks

-- View current settings
SHOW shared_buffers;
SHOW work_mem;

-- Check all settings
SELECT name, setting, unit FROM pg_settings
WHERE category LIKE '%Memory%';

Connection Pooling

Each PostgreSQL connection consumes significant memory (~5–10 MB). For applications with many short-lived connections (web servers), a connection pooler like PgBouncer sits between the application and the database, reusing a smaller pool of actual database connections.

Identifying Slow Queries

Enable the pg_stat_statements extension to track query performance over time:

-- Enable the extension (superuser required)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the top 10 slowest queries by total execution time
SELECT query,
       calls,
       ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
       ROUND(mean_exec_time::NUMERIC, 2) AS avg_ms,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Identifying Table Bloat

-- Check for tables with many dead rows (need VACUUM)
SELECT relname AS table_name,
       n_dead_tup AS dead_rows,
       n_live_tup AS live_rows,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Checking Index Usage

-- Find indexes that are never used (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;

Unused indexes waste space and slow down writes. Dropping them improves INSERT and UPDATE performance.

Performance Tuning Checklist

Run EXPLAIN ANALYZE on slow queries to identify bottlenecks.
Add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
Run VACUUM ANALYZE after large data loads or bulk deletions.
Tune shared_buffers, work_mem, and effective_cache_size in postgresql.conf.
Use pg_stat_statements to monitor which queries consume the most time.
Set random_page_cost = 1.1 if the database runs on SSD storage.
Use a connection pooler like PgBouncer for high-concurrency web applications.
Remove unused indexes identified via pg_stat_user_indexes.

Key Points

pg_dump creates logical backups of a single database; pg_dumpall backs up all databases and global roles.
Custom format (-F c) backups are compressed and support parallel restore with pg_restore -j.
pg_basebackup creates a physical copy of the entire data directory, used for replication and PITR.
Always test restores — an untested backup strategy provides false confidence.
EXPLAIN ANALYZE reveals the actual query execution plan and real timing data.
Missing indexes on filtered and joined columns are the most common cause of slow queries.
VACUUM removes dead rows from MVCC; ANALYZE updates planner statistics — autovacuum handles most cases automatically.
Tuning shared_buffers, work_mem, and effective_cache_size in postgresql.conf is the first step in server-level performance optimization.

Previous lessons

Back to courses