GraphQL in Production – Best Practices

Running GraphQL in production is different from running it in development. Security, performance, observability, and schema management all require deliberate decisions. This topic brings together the practices that keep a GraphQL API reliable, fast, and safe under real-world traffic.

Production Checklist

  Security
  ─────────
  □ Disable introspection in production
  □ Enforce depth limiting (max 7–10 levels)
  □ Enforce query complexity limits
  □ Add rate limiting by user and IP
  □ Use persisted queries (allow-list mode for public APIs)
  □ Validate and sanitize all input arguments
  □ Mask error details — log real errors server-side only

  Performance
  ────────────
  □ Use DataLoader for every N+1-prone resolver
  □ Add Redis cache for frequently read, rarely changed data
  □ Set cacheControl hints on stable types
  □ Enable query timeout (10–30 seconds max)
  □ Profile slow resolvers with tracing

  Observability
  ─────────────
  □ Log every operation by name, duration, and user
  □ Track error rates per operation name
  □ Monitor P95/P99 latency per operation
  □ Set alerts on error rate spikes
  □ Use Apollo Studio or similar for operation analytics

  Schema management
  ──────────────────
  □ Version the schema in Git
  □ Run schema checks before every deployment
  □ Never remove or rename fields without a deprecation period
  □ Use @deprecated and monitor usage before deleting fields

Schema Versioning Without /v2

GraphQL avoids URL versioning. Instead, you evolve the schema backward-compatibly. Add new fields freely. Deprecate old fields before removing them. This gives clients time to migrate without breaking anything.

  Backward-compatible changes:      Breaking changes (avoid):
  ─────────────────────────────     ────────────────────────
  Add a new field                   Remove or rename a field
  Add a new type                    Change a field's type
  Add an optional argument          Add a required argument
  Deprecate an old field            Remove an enum value
  Add a new enum value              Change argument type

Schema Deprecation Workflow

  Month 1: Add new field, keep old one
  ──────────────────────────────────────
  type User {
    email:       String @deprecated(reason: "Use emailAddress")
    emailAddress: String!   ← New canonical field
  }

  Month 2: Track usage in Apollo Studio
  ──────────────────────────────────────
  See which clients still query "email"
  Notify those teams to migrate to "emailAddress"

  Month 3: Remove the deprecated field
  ──────────────────────────────────────
  type User {
    emailAddress: String!   ← Old "email" field removed
  }

Observability – Logging Every Operation

  Apollo Server plugin for operation logging:
  ─────────────────────────────────────────────
  const loggingPlugin = {
    async requestDidStart() {
      return {
        async willSendResponse({ operationName, response, contextValue }) {
          const duration = Date.now() - contextValue.startTime;
          const hasErrors = !!response.body.singleResult.errors;

          console.log(JSON.stringify({
            operation: operationName || 'anonymous',
            userId:    contextValue.user?.id,
            duration,
            hasErrors,
          }));
        }
      };
    }
  };

  const server = new ApolloServer({
    typeDefs,
    resolvers,
    plugins: [loggingPlugin],
  });

Deployment Architecture

  Production deployment pattern:
  ───────────────────────────────
  Internet
    │
  CDN (caches GET persisted queries, serves static assets)
    │
  Load Balancer
    │
  ┌──────────────────────────────────┐
  │  GraphQL Server instances (×N)   │
  │  (Node.js / Apollo)              │
  └─────────────────┬────────────────┘
                    │
         ┌──────────┼──────────┐
         ▼          ▼          ▼
      Database   Redis Cache  Other Services
      (Primary   (Sessions,   (Email, Payments,
       + Replica) Rate limits,  Search, etc.)
                 DataLoader
                 cache)

Health Checks and Graceful Shutdown

  // Health check endpoint — used by load balancers
  app.get('/health', async (req, res) => {
    try {
      await db.query('SELECT 1');   // Verify DB is reachable
      res.json({ status: 'healthy', uptime: process.uptime() });
    } catch {
      res.status(503).json({ status: 'unhealthy' });
    }
  });

  // Graceful shutdown — finish in-flight requests before stopping
  process.on('SIGTERM', async () => {
    await server.stop();
    process.exit(0);
  });

Key Points

Disable introspection, enforce depth and complexity limits, and use persisted queries before launching to production.
Evolve your schema backward-compatibly by adding new fields and deprecating old ones — never delete without notice.
Log every operation by name and duration so you can identify slow and failing queries quickly.
Use DataLoader, Redis caching, and query timeouts together for a layered performance strategy.
Run multiple server instances behind a load balancer with a health check endpoint for high availability.

Previous lesson

Back to course