PostgreSQL Backup Restore and Performance Tuning

Two of the most critical responsibilities in managing a PostgreSQL database are ensuring data can be recovered after failure (backup and restore) and ensuring the database responds to queries quickly (performance tuning). This final topic brings together the operational skills needed to run PostgreSQL reliably and efficiently in a real-world environment.

Part 1: Backup and Restore

Types of Backups

TypeDescriptionTool
Logical BackupExports SQL statements that recreate the databasepg_dump, pg_dumpall
Physical BackupCopies raw data files from diskpg_basebackup, filesystem copy
Continuous Archiving (PITR)Ongoing WAL archiving — restore to any point in timeWAL archiving + pg_basebackup

pg_dump — Backup a Single Database

pg_dump exports one database as a SQL script or archive. It is run from the operating system shell, not inside psql.

-- Plain SQL format (human-readable)
pg_dump -U postgres -d company_db -f company_db_backup.sql

-- Custom archive format (compressed, best for large databases)
pg_dump -U postgres -d company_db -F c -f company_db.dump

-- Table of contents: shows what is in an archive
pg_restore --list company_db.dump

pg_dump Options

OptionDescription
-U usernameConnect as this user
-d dbnameDatabase to dump
-f filenameOutput file name
-F cCustom format (compressed)
-F pPlain SQL format (default)
-t tablenameDump only a specific table
-sSchema only (no data)
-aData only (no schema)
-j nParallel dump using n jobs (custom/directory format only)

pg_dumpall — Backup All Databases

pg_dumpall -U postgres -f all_databases_backup.sql

pg_dumpall dumps every database in the PostgreSQL cluster, plus global objects such as roles and tablespaces. Only plain SQL format is supported.

Restoring with psql (Plain SQL Format)

-- Create the target database first
createdb -U postgres company_db_restored

-- Restore from SQL file
psql -U postgres -d company_db_restored -f company_db_backup.sql

Restoring with pg_restore (Custom/Archive Format)

-- Create the target database
createdb -U postgres company_db_restored

-- Restore from custom dump
pg_restore -U postgres -d company_db_restored company_db.dump

-- Restore only a specific table
pg_restore -U postgres -d company_db_restored -t orders company_db.dump

-- Parallel restore (faster on multi-core systems)
pg_restore -U postgres -d company_db_restored -j 4 company_db.dump

Physical Backup with pg_basebackup

pg_basebackup copies the entire PostgreSQL data directory. It is used as the foundation for streaming replication and Point-In-Time Recovery (PITR).

pg_basebackup -U postgres -D /var/backups/pgbase -Ft -z -P
  • -D: destination directory
  • -Ft: tar format
  • -z: gzip compression
  • -P: show progress

Backup Best Practices

  • Schedule automated backups using cron (Linux/macOS) or Task Scheduler (Windows).
  • Store backups on a different physical machine or cloud storage — never only on the database server.
  • Test restores regularly — an untested backup is not a real backup.
  • Retain multiple backup generations (daily, weekly, monthly).
  • Use custom format (-F c) for large databases; it is smaller and supports parallel restore.

Part 2: Performance Tuning

Understanding EXPLAIN and EXPLAIN ANALYZE

EXPLAIN shows the query execution plan — how PostgreSQL intends to retrieve the data. EXPLAIN ANALYZE actually runs the query and shows real timing data alongside the plan.

-- Show the plan without running
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- Run the query and show actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Reading the Query Plan

Key terms in an EXPLAIN output:

TermMeaning
Seq ScanFull table scan — reads every row. Slow on large tables without indexes.
Index ScanUses an index to locate rows. Fast for selective queries.
Bitmap Heap ScanCombines index results with a heap scan. Used for moderate selectivity.
Hash JoinJoins two tables using a hash table. Efficient for large unsorted datasets.
Nested LoopJoins by looping over the inner table for each outer row. Fast for small tables.
cost=X..YEstimated startup cost..total cost. Lower is better.
rows=NEstimated number of rows returned.
actual timeReal elapsed time in milliseconds (EXPLAIN ANALYZE only).

Indexing for Performance

Missing indexes are the most common performance problem. If EXPLAIN shows a Seq Scan on a large table when filtering by a column, an index on that column will often solve the problem.

-- Slow query (Seq Scan)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Add the index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- Now uses Index Scan
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

VACUUM and ANALYZE

PostgreSQL uses Multi-Version Concurrency Control (MVCC): deleted and updated rows are not immediately removed. Over time, these dead rows accumulate and slow down queries. VACUUM reclaims that space.

-- Remove dead rows (does not shrink the file)
VACUUM orders;

-- Remove dead rows and reclaim disk space (locks the table)
VACUUM FULL orders;

-- Update query planner statistics
ANALYZE orders;

-- Do both at once
VACUUM ANALYZE orders;

PostgreSQL runs autovacuum automatically in the background. Manual VACUUM is needed after large bulk operations (like inserting or deleting millions of rows at once).

Table Statistics and the Query Planner

The query planner uses statistics about column value distributions to estimate row counts and choose the best plan. Running ANALYZE keeps these statistics current.

-- View planner statistics for a table
SELECT attname, n_distinct, correlation
FROM pg_stats
WHERE tablename = 'orders';

Key postgresql.conf Settings for Performance

The postgresql.conf file controls global server behavior. These are the most impactful settings for performance:

SettingDescriptionRecommended Starting Point
shared_buffersMemory used for caching data pages25% of total RAM
work_memMemory per sort/hash operation per query4–16 MB (increase for complex sorts)
maintenance_work_memMemory for VACUUM, CREATE INDEX, etc.256 MB or more
effective_cache_sizeEstimate of OS + PostgreSQL cache (planner hint)50–75% of total RAM
max_connectionsMaximum simultaneous connections100–200 (use a connection pooler for more)
checkpoint_completion_targetSmooths out checkpoint I/O writes0.9
random_page_costPlanner's cost estimate for random disk access1.1 for SSDs, 4.0 for spinning disks
-- View current settings
SHOW shared_buffers;
SHOW work_mem;

-- Check all settings
SELECT name, setting, unit FROM pg_settings
WHERE category LIKE '%Memory%';

Connection Pooling

Each PostgreSQL connection consumes significant memory (~5–10 MB). For applications with many short-lived connections (web servers), a connection pooler like PgBouncer sits between the application and the database, reusing a smaller pool of actual database connections.

Identifying Slow Queries

Enable the pg_stat_statements extension to track query performance over time:

-- Enable the extension (superuser required)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the top 10 slowest queries by total execution time
SELECT query,
       calls,
       ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
       ROUND(mean_exec_time::NUMERIC, 2) AS avg_ms,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Identifying Table Bloat

-- Check for tables with many dead rows (need VACUUM)
SELECT relname AS table_name,
       n_dead_tup AS dead_rows,
       n_live_tup AS live_rows,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Checking Index Usage

-- Find indexes that are never used (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;

Unused indexes waste space and slow down writes. Dropping them improves INSERT and UPDATE performance.

Performance Tuning Checklist

  • Run EXPLAIN ANALYZE on slow queries to identify bottlenecks.
  • Add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
  • Run VACUUM ANALYZE after large data loads or bulk deletions.
  • Tune shared_buffers, work_mem, and effective_cache_size in postgresql.conf.
  • Use pg_stat_statements to monitor which queries consume the most time.
  • Set random_page_cost = 1.1 if the database runs on SSD storage.
  • Use a connection pooler like PgBouncer for high-concurrency web applications.
  • Remove unused indexes identified via pg_stat_user_indexes.

Key Points

  • pg_dump creates logical backups of a single database; pg_dumpall backs up all databases and global roles.
  • Custom format (-F c) backups are compressed and support parallel restore with pg_restore -j.
  • pg_basebackup creates a physical copy of the entire data directory, used for replication and PITR.
  • Always test restores — an untested backup strategy provides false confidence.
  • EXPLAIN ANALYZE reveals the actual query execution plan and real timing data.
  • Missing indexes on filtered and joined columns are the most common cause of slow queries.
  • VACUUM removes dead rows from MVCC; ANALYZE updates planner statistics — autovacuum handles most cases automatically.
  • Tuning shared_buffers, work_mem, and effective_cache_size in postgresql.conf is the first step in server-level performance optimization.

Leave a Comment

Your email address will not be published. Required fields are marked *