FastAPI Performance Tuning and Best Practices
A working FastAPI app is a good start. A fast, reliable, maintainable FastAPI app requires deliberate choices around concurrency, database queries, caching, and code organization. This final topic brings together the patterns that separate hobby projects from production systems.
1. Run Multiple Workers
A single Uvicorn process uses one CPU core. For production, run multiple worker processes with Gunicorn so your app uses all available cores:
pip install gunicorn gunicorn main:app \ --workers 4 \ --worker-class uvicorn.workers.UvicornWorker \ --bind 0.0.0.0:8000
Server with 4 CPU cores: Worker 1 ──→ handles requests for CPU core 1 Worker 2 ──→ handles requests for CPU core 2 Worker 3 ──→ handles requests for CPU core 3 Worker 4 ──→ handles requests for CPU core 4 Rule of thumb: workers = (2 × CPU cores) + 1 4 cores → 9 workers
2. Avoid N+1 Query Problems
The N+1 problem fetches one record, then loops and fetches related records one at a time:
# BAD — N+1 queries (1 query for users + N queries for posts)
users = db.query(User).all()
for user in users:
user.posts ← triggers a separate DB query each time
# GOOD — eager load in one query with joinedload
from sqlalchemy.orm import joinedload
users = db.query(User).options(joinedload(User.posts)).all()
# Single query with JOIN — all data in one trip
100 users: N+1: 1 + 100 = 101 database queries joinedload: 1 database query At 1000 requests/minute, this difference is enormous.
3. Add Response Caching
For data that changes rarely (product categories, config), cache the response so the database is not hit on every request:
pip install fastapi-cache2 redis
from fastapi_cache import FastAPICache
from fastapi_cache.backends.redis import RedisBackend
from fastapi_cache.decorator import cache
@app.get("/categories")
@cache(expire=300) ← cache for 300 seconds (5 minutes)
async def get_categories():
return db.query(Category).all()
First request: DB query → result cached in Redis → returned Next 300 sec: No DB query → result from Redis cache → returned After 300 sec: Cache expires → next request queries DB again
4. Use Pagination on All List Endpoints
# Never return unlimited lists
@app.get("/users")
def get_users(skip: int = 0, limit: int = Query(default=20, le=100)):
return db.query(User).offset(skip).limit(limit).all()
# le=100 means the client cannot request more than 100 at once
5. Add Database Indexes
# Without index: full table scan for every query
class User(Base):
email = Column(String) ← searches scan ALL rows
# With index: direct lookup
class User(Base):
email = Column(String, index=True) ← searches go directly to the row
Index rules:
Always index: primary keys (automatic), foreign keys,
fields used in .filter() or .order_by()
Skip: fields never used in queries,
columns with very few unique values (like boolean)
6. Structure Your Code for Growth
production-app/ ├── main.py ← app creation, middleware, lifespan ├── config.py ← settings from environment variables ├── database.py ← engine, session, Base ├── models/ ← SQLAlchemy models ├── schemas/ ← Pydantic schemas ├── services/ ← business logic ├── routers/ ← route handlers (thin) ├── dependencies.py ← shared Depends functions └── tests/ ← pytest tests
7. Use Pydantic Settings for Configuration
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str
secret_key: str
debug: bool = False
allowed_origins: list[str] = ["http://localhost:3000"]
class Config:
env_file = ".env"
settings = Settings()
Pydantic reads from environment variables automatically and validates their types. No manual os.getenv() calls scattered throughout your code.
8. Log Structured Data
import logging
import json
logger = logging.getLogger("myapp")
@app.middleware("http")
async def log_requests(request: Request, call_next):
response = await call_next(request)
logger.info(json.dumps({
"method": request.method,
"path": request.url.path,
"status": response.status_code,
}))
return response
Performance Checklist
□ Multiple Gunicorn workers in production □ Async routes for I/O-bound operations □ Connection pooling for database (SQLAlchemy handles this) □ Eager loading to avoid N+1 queries □ Database indexes on filtered/sorted columns □ Pagination on all list endpoints □ Caching for frequently read, rarely changed data □ Gzip middleware for large responses □ Timeout on all external HTTP calls □ Health check endpoint at /health
Key Points
- Run multiple Gunicorn workers in production to use all CPU cores.
- Fix N+1 query problems with
joinedload()to reduce database round-trips dramatically. - Cache slow, rarely-changing responses with Redis to avoid repeat database queries.
- Always paginate list endpoints — never return an unlimited number of rows.
- Use Pydantic Settings to manage configuration from environment variables in a type-safe way.
