PostgreSQL Backup Restore and Performance Tuning
Two of the most critical responsibilities in managing a PostgreSQL database are ensuring data can be recovered after failure (backup and restore) and ensuring the database responds to queries quickly (performance tuning). This final topic brings together the operational skills needed to run PostgreSQL reliably and efficiently in a real-world environment.
Part 1: Backup and Restore
Types of Backups
| Type | Description | Tool |
|---|---|---|
| Logical Backup | Exports SQL statements that recreate the database | pg_dump, pg_dumpall |
| Physical Backup | Copies raw data files from disk | pg_basebackup, filesystem copy |
| Continuous Archiving (PITR) | Ongoing WAL archiving — restore to any point in time | WAL archiving + pg_basebackup |
pg_dump — Backup a Single Database
pg_dump exports one database as a SQL script or archive. It is run from the operating system shell, not inside psql.
-- Plain SQL format (human-readable)
pg_dump -U postgres -d company_db -f company_db_backup.sql
-- Custom archive format (compressed, best for large databases)
pg_dump -U postgres -d company_db -F c -f company_db.dump
-- Table of contents: shows what is in an archive
pg_restore --list company_db.dumppg_dump Options
| Option | Description |
|---|---|
-U username | Connect as this user |
-d dbname | Database to dump |
-f filename | Output file name |
-F c | Custom format (compressed) |
-F p | Plain SQL format (default) |
-t tablename | Dump only a specific table |
-s | Schema only (no data) |
-a | Data only (no schema) |
-j n | Parallel dump using n jobs (custom/directory format only) |
pg_dumpall — Backup All Databases
pg_dumpall -U postgres -f all_databases_backup.sqlpg_dumpall dumps every database in the PostgreSQL cluster, plus global objects such as roles and tablespaces. Only plain SQL format is supported.
Restoring with psql (Plain SQL Format)
-- Create the target database first
createdb -U postgres company_db_restored
-- Restore from SQL file
psql -U postgres -d company_db_restored -f company_db_backup.sqlRestoring with pg_restore (Custom/Archive Format)
-- Create the target database
createdb -U postgres company_db_restored
-- Restore from custom dump
pg_restore -U postgres -d company_db_restored company_db.dump
-- Restore only a specific table
pg_restore -U postgres -d company_db_restored -t orders company_db.dump
-- Parallel restore (faster on multi-core systems)
pg_restore -U postgres -d company_db_restored -j 4 company_db.dumpPhysical Backup with pg_basebackup
pg_basebackup copies the entire PostgreSQL data directory. It is used as the foundation for streaming replication and Point-In-Time Recovery (PITR).
pg_basebackup -U postgres -D /var/backups/pgbase -Ft -z -P-D: destination directory-Ft: tar format-z: gzip compression-P: show progress
Backup Best Practices
- Schedule automated backups using cron (Linux/macOS) or Task Scheduler (Windows).
- Store backups on a different physical machine or cloud storage — never only on the database server.
- Test restores regularly — an untested backup is not a real backup.
- Retain multiple backup generations (daily, weekly, monthly).
- Use custom format (
-F c) for large databases; it is smaller and supports parallel restore.
Part 2: Performance Tuning
Understanding EXPLAIN and EXPLAIN ANALYZE
EXPLAIN shows the query execution plan — how PostgreSQL intends to retrieve the data. EXPLAIN ANALYZE actually runs the query and shows real timing data alongside the plan.
-- Show the plan without running
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Run the query and show actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;Reading the Query Plan
Key terms in an EXPLAIN output:
| Term | Meaning |
|---|---|
| Seq Scan | Full table scan — reads every row. Slow on large tables without indexes. |
| Index Scan | Uses an index to locate rows. Fast for selective queries. |
| Bitmap Heap Scan | Combines index results with a heap scan. Used for moderate selectivity. |
| Hash Join | Joins two tables using a hash table. Efficient for large unsorted datasets. |
| Nested Loop | Joins by looping over the inner table for each outer row. Fast for small tables. |
| cost=X..Y | Estimated startup cost..total cost. Lower is better. |
| rows=N | Estimated number of rows returned. |
| actual time | Real elapsed time in milliseconds (EXPLAIN ANALYZE only). |
Indexing for Performance
Missing indexes are the most common performance problem. If EXPLAIN shows a Seq Scan on a large table when filtering by a column, an index on that column will often solve the problem.
-- Slow query (Seq Scan)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Add the index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- Now uses Index Scan
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;VACUUM and ANALYZE
PostgreSQL uses Multi-Version Concurrency Control (MVCC): deleted and updated rows are not immediately removed. Over time, these dead rows accumulate and slow down queries. VACUUM reclaims that space.
-- Remove dead rows (does not shrink the file)
VACUUM orders;
-- Remove dead rows and reclaim disk space (locks the table)
VACUUM FULL orders;
-- Update query planner statistics
ANALYZE orders;
-- Do both at once
VACUUM ANALYZE orders;PostgreSQL runs autovacuum automatically in the background. Manual VACUUM is needed after large bulk operations (like inserting or deleting millions of rows at once).
Table Statistics and the Query Planner
The query planner uses statistics about column value distributions to estimate row counts and choose the best plan. Running ANALYZE keeps these statistics current.
-- View planner statistics for a table
SELECT attname, n_distinct, correlation
FROM pg_stats
WHERE tablename = 'orders';Key postgresql.conf Settings for Performance
The postgresql.conf file controls global server behavior. These are the most impactful settings for performance:
| Setting | Description | Recommended Starting Point |
|---|---|---|
shared_buffers | Memory used for caching data pages | 25% of total RAM |
work_mem | Memory per sort/hash operation per query | 4–16 MB (increase for complex sorts) |
maintenance_work_mem | Memory for VACUUM, CREATE INDEX, etc. | 256 MB or more |
effective_cache_size | Estimate of OS + PostgreSQL cache (planner hint) | 50–75% of total RAM |
max_connections | Maximum simultaneous connections | 100–200 (use a connection pooler for more) |
checkpoint_completion_target | Smooths out checkpoint I/O writes | 0.9 |
random_page_cost | Planner's cost estimate for random disk access | 1.1 for SSDs, 4.0 for spinning disks |
-- View current settings
SHOW shared_buffers;
SHOW work_mem;
-- Check all settings
SELECT name, setting, unit FROM pg_settings
WHERE category LIKE '%Memory%';Connection Pooling
Each PostgreSQL connection consumes significant memory (~5–10 MB). For applications with many short-lived connections (web servers), a connection pooler like PgBouncer sits between the application and the database, reusing a smaller pool of actual database connections.
Identifying Slow Queries
Enable the pg_stat_statements extension to track query performance over time:
-- Enable the extension (superuser required)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find the top 10 slowest queries by total execution time
SELECT query,
calls,
ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
ROUND(mean_exec_time::NUMERIC, 2) AS avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Identifying Table Bloat
-- Check for tables with many dead rows (need VACUUM)
SELECT relname AS table_name,
n_dead_tup AS dead_rows,
n_live_tup AS live_rows,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;Checking Index Usage
-- Find indexes that are never used (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;Unused indexes waste space and slow down writes. Dropping them improves INSERT and UPDATE performance.
Performance Tuning Checklist
- Run
EXPLAIN ANALYZEon slow queries to identify bottlenecks. - Add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
- Run
VACUUM ANALYZEafter large data loads or bulk deletions. - Tune
shared_buffers,work_mem, andeffective_cache_sizeinpostgresql.conf. - Use
pg_stat_statementsto monitor which queries consume the most time. - Set
random_page_cost = 1.1if the database runs on SSD storage. - Use a connection pooler like PgBouncer for high-concurrency web applications.
- Remove unused indexes identified via
pg_stat_user_indexes.
Key Points
pg_dumpcreates logical backups of a single database;pg_dumpallbacks up all databases and global roles.- Custom format (
-F c) backups are compressed and support parallel restore withpg_restore -j. pg_basebackupcreates a physical copy of the entire data directory, used for replication and PITR.- Always test restores — an untested backup strategy provides false confidence.
EXPLAIN ANALYZEreveals the actual query execution plan and real timing data.- Missing indexes on filtered and joined columns are the most common cause of slow queries.
- VACUUM removes dead rows from MVCC; ANALYZE updates planner statistics — autovacuum handles most cases automatically.
- Tuning
shared_buffers,work_mem, andeffective_cache_sizeinpostgresql.confis the first step in server-level performance optimization.
