GraphQL in Production – Best Practices
Running GraphQL in production is different from running it in development. Security, performance, observability, and schema management all require deliberate decisions. This topic brings together the practices that keep a GraphQL API reliable, fast, and safe under real-world traffic.
Production Checklist
Security ───────── □ Disable introspection in production □ Enforce depth limiting (max 7–10 levels) □ Enforce query complexity limits □ Add rate limiting by user and IP □ Use persisted queries (allow-list mode for public APIs) □ Validate and sanitize all input arguments □ Mask error details — log real errors server-side only Performance ──────────── □ Use DataLoader for every N+1-prone resolver □ Add Redis cache for frequently read, rarely changed data □ Set cacheControl hints on stable types □ Enable query timeout (10–30 seconds max) □ Profile slow resolvers with tracing Observability ───────────── □ Log every operation by name, duration, and user □ Track error rates per operation name □ Monitor P95/P99 latency per operation □ Set alerts on error rate spikes □ Use Apollo Studio or similar for operation analytics Schema management ────────────────── □ Version the schema in Git □ Run schema checks before every deployment □ Never remove or rename fields without a deprecation period □ Use @deprecated and monitor usage before deleting fields
Schema Versioning Without /v2
GraphQL avoids URL versioning. Instead, you evolve the schema backward-compatibly. Add new fields freely. Deprecate old fields before removing them. This gives clients time to migrate without breaking anything.
Backward-compatible changes: Breaking changes (avoid): ───────────────────────────── ──────────────────────── Add a new field Remove or rename a field Add a new type Change a field's type Add an optional argument Add a required argument Deprecate an old field Remove an enum value Add a new enum value Change argument type
Schema Deprecation Workflow
Month 1: Add new field, keep old one
──────────────────────────────────────
type User {
email: String @deprecated(reason: "Use emailAddress")
emailAddress: String! ← New canonical field
}
Month 2: Track usage in Apollo Studio
──────────────────────────────────────
See which clients still query "email"
Notify those teams to migrate to "emailAddress"
Month 3: Remove the deprecated field
──────────────────────────────────────
type User {
emailAddress: String! ← Old "email" field removed
}
Observability – Logging Every Operation
Apollo Server plugin for operation logging:
─────────────────────────────────────────────
const loggingPlugin = {
async requestDidStart() {
return {
async willSendResponse({ operationName, response, contextValue }) {
const duration = Date.now() - contextValue.startTime;
const hasErrors = !!response.body.singleResult.errors;
console.log(JSON.stringify({
operation: operationName || 'anonymous',
userId: contextValue.user?.id,
duration,
hasErrors,
}));
}
};
}
};
const server = new ApolloServer({
typeDefs,
resolvers,
plugins: [loggingPlugin],
});
Deployment Architecture
Production deployment pattern:
───────────────────────────────
Internet
│
CDN (caches GET persisted queries, serves static assets)
│
Load Balancer
│
┌──────────────────────────────────┐
│ GraphQL Server instances (×N) │
│ (Node.js / Apollo) │
└─────────────────┬────────────────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
Database Redis Cache Other Services
(Primary (Sessions, (Email, Payments,
+ Replica) Rate limits, Search, etc.)
DataLoader
cache)
Health Checks and Graceful Shutdown
// Health check endpoint — used by load balancers
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1'); // Verify DB is reachable
res.json({ status: 'healthy', uptime: process.uptime() });
} catch {
res.status(503).json({ status: 'unhealthy' });
}
});
// Graceful shutdown — finish in-flight requests before stopping
process.on('SIGTERM', async () => {
await server.stop();
process.exit(0);
});
Key Points
- Disable introspection, enforce depth and complexity limits, and use persisted queries before launching to production.
- Evolve your schema backward-compatibly by adding new fields and deprecating old ones — never delete without notice.
- Log every operation by name and duration so you can identify slow and failing queries quickly.
- Use DataLoader, Redis caching, and query timeouts together for a layered performance strategy.
- Run multiple server instances behind a load balancer with a health check endpoint for high availability.
