FastAPI Performance Tuning and Best Practices

A working FastAPI app is a good start. A fast, reliable, maintainable FastAPI app requires deliberate choices around concurrency, database queries, caching, and code organization. This final topic brings together the patterns that separate hobby projects from production systems.

1. Run Multiple Workers

A single Uvicorn process uses one CPU core. For production, run multiple worker processes with Gunicorn so your app uses all available cores:

pip install gunicorn

gunicorn main:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000

Server with 4 CPU cores:

  Worker 1 ──→ handles requests for CPU core 1
  Worker 2 ──→ handles requests for CPU core 2
  Worker 3 ──→ handles requests for CPU core 3
  Worker 4 ──→ handles requests for CPU core 4

Rule of thumb: workers = (2 × CPU cores) + 1
  4 cores → 9 workers

2. Avoid N+1 Query Problems

The N+1 problem fetches one record, then loops and fetches related records one at a time:

# BAD — N+1 queries (1 query for users + N queries for posts)
users = db.query(User).all()
for user in users:
    user.posts   ← triggers a separate DB query each time

# GOOD — eager load in one query with joinedload
from sqlalchemy.orm import joinedload

users = db.query(User).options(joinedload(User.posts)).all()
# Single query with JOIN — all data in one trip

100 users:
  N+1:          1 + 100 = 101 database queries
  joinedload:   1 database query

At 1000 requests/minute, this difference is enormous.

3. Add Response Caching

For data that changes rarely (product categories, config), cache the response so the database is not hit on every request:

pip install fastapi-cache2 redis

from fastapi_cache import FastAPICache
from fastapi_cache.backends.redis import RedisBackend
from fastapi_cache.decorator import cache

@app.get("/categories")
@cache(expire=300)   ← cache for 300 seconds (5 minutes)
async def get_categories():
    return db.query(Category).all()

First request:  DB query → result cached in Redis → returned
Next 300 sec:   No DB query → result from Redis cache → returned
After 300 sec:  Cache expires → next request queries DB again

4. Use Pagination on All List Endpoints

# Never return unlimited lists
@app.get("/users")
def get_users(skip: int = 0, limit: int = Query(default=20, le=100)):
    return db.query(User).offset(skip).limit(limit).all()

# le=100 means the client cannot request more than 100 at once

5. Add Database Indexes

# Without index: full table scan for every query
class User(Base):
    email = Column(String)   ← searches scan ALL rows

# With index: direct lookup
class User(Base):
    email = Column(String, index=True)   ← searches go directly to the row

Index rules:
  Always index: primary keys (automatic), foreign keys,
                fields used in .filter() or .order_by()
  Skip:         fields never used in queries,
                columns with very few unique values (like boolean)

6. Structure Your Code for Growth

production-app/
├── main.py               ← app creation, middleware, lifespan
├── config.py             ← settings from environment variables
├── database.py           ← engine, session, Base
├── models/               ← SQLAlchemy models
├── schemas/              ← Pydantic schemas
├── services/             ← business logic
├── routers/              ← route handlers (thin)
├── dependencies.py       ← shared Depends functions
└── tests/                ← pytest tests

7. Use Pydantic Settings for Configuration

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    secret_key: str
    debug: bool = False
    allowed_origins: list[str] = ["http://localhost:3000"]

    class Config:
        env_file = ".env"

settings = Settings()

Pydantic reads from environment variables automatically and validates their types. No manual os.getenv() calls scattered throughout your code.

8. Log Structured Data

import logging
import json

logger = logging.getLogger("myapp")

@app.middleware("http")
async def log_requests(request: Request, call_next):
    response = await call_next(request)
    logger.info(json.dumps({
        "method":  request.method,
        "path":    request.url.path,
        "status":  response.status_code,
    }))
    return response

Performance Checklist

□ Multiple Gunicorn workers in production
□ Async routes for I/O-bound operations
□ Connection pooling for database (SQLAlchemy handles this)
□ Eager loading to avoid N+1 queries
□ Database indexes on filtered/sorted columns
□ Pagination on all list endpoints
□ Caching for frequently read, rarely changed data
□ Gzip middleware for large responses
□ Timeout on all external HTTP calls
□ Health check endpoint at /health

Key Points

Run multiple Gunicorn workers in production to use all CPU cores.
Fix N+1 query problems with joinedload() to reduce database round-trips dramatically.
Cache slow, rarely-changing responses with Redis to avoid repeat database queries.
Always paginate list endpoints — never return an unlimited number of rows.
Use Pydantic Settings to manage configuration from environment variables in a type-safe way.

Previous lessons

Back to courses