API Security Rate Limiting and Throttling

APIs that allow unlimited requests are vulnerable to both technical attacks — like Denial of Service — and business logic abuse like automated scraping, credential stuffing, and fraud. Rate limiting controls how many requests a client can make in a given time window. Throttling slows down clients who exceed limits rather than blocking them entirely. Both are essential controls for production APIs.

Why Unlimited API Access Is Dangerous

Without rate limiting, an API is exposed to:

Denial of Service (DoS):
  Single attacker floods the API with millions of requests.
  Server resources (CPU, memory, DB connections) are exhausted.
  Legitimate users cannot connect.

Distributed DoS (DDoS):
  Thousands of compromised devices (botnet) each send requests.
  Individual request rates look normal per device.
  Combined load crushes the server.

Credential Stuffing:
  Attacker has 10 million stolen username/password pairs from other breaches.
  Tests all of them against your login API.
  Without rate limiting: 10 million tests in under an hour.
  With rate limiting (10 attempts per minute per IP): would take 2 years.

Automated Scraping:
  Competitor scrapes your entire product catalog, pricing, and customer reviews.
  Costs you bandwidth, server resources, and competitive advantage.

Brute Force:
  Password guessing, OTP guessing, coupon code guessing — all enabled
  by unlimited request rates.

API Key Enumeration:
  Attacker generates random API keys and tests them against your API.
  Without rate limiting, can test millions per hour.

Rate Limiting Strategies

Strategy 1: Fixed Window Counter

  Window: 1 minute
  Limit:  100 requests per window

  Minute 1 (12:00:00 - 12:00:59):
    Requests 1-100:  Allowed
    Request 101:     Blocked
  
  Minute 2 (12:01:00 - 12:01:59):
    Counter resets to 0.
    Requests 1-100:  Allowed again

  Problem: At 12:00:55, client sends 100 requests.
           At 12:01:05, counter resets — client sends 100 more.
           200 requests in 10 seconds. Spike bypasses the limit.

──────────────────────────────────────────────────────────

Strategy 2: Sliding Window Counter

  Window: 1 minute (rolling)
  Limit:  100 requests per 60 seconds

  At any point, counts requests in the last 60 seconds.
  No burst spike at window boundaries.
  Smoother protection but slightly more memory-intensive.

──────────────────────────────────────────────────────────

Strategy 3: Token Bucket (Most Flexible)

  Imagine a bucket that holds tokens.
  Tokens refill at a constant rate: e.g., 2 tokens per second.
  Bucket capacity: 100 tokens (allows bursts).
  Each request costs 1 token.

  Behavior:
    Client has been idle → bucket is full (100 tokens)
    Client sends 80 requests at once → allowed (burst capacity)
    Client continues sending → must wait for refill (2/sec)
  
  Real-world analogy: Mobile data — you can use your daily allowance
  quickly or spread it out, but once gone, you must wait for the next day.

──────────────────────────────────────────────────────────

Strategy 4: Leaky Bucket

  Requests fill a queue. Queue drains at a fixed rate.
  Smooths out spikes. Excess requests are queued, not dropped.
  If queue overflows, new requests are rejected.
  Good for: APIs that need consistent, predictable processing rate.

Rate Limiting Dimensions

Rate limits should be applied across multiple dimensions:

By IP Address:
  100 requests per minute per IP.
  Catches unauthenticated abuse and bots.
  Problem: Multiple users behind same corporate/ISP NAT share one IP.

By API Key or User ID:
  1000 requests per hour per authenticated user.
  Catches abuse by authenticated clients.
  More accurate than IP-based for authenticated APIs.

By Endpoint:
  Different limits for different endpoints based on sensitivity.
  
  POST /api/login          → 5 attempts per minute per IP
  POST /api/password/reset → 3 attempts per hour per IP
  GET  /api/products       → 500 requests per minute per user
  GET  /api/search         → 60 requests per minute per user
  POST /api/payments       → 10 requests per minute per user

By Geographic Region:
  If your service only serves India, unusual traffic from Eastern Europe
  at 3 AM may warrant stricter limits or CAPTCHA challenges.

By Account Tier:
  Free tier:       100 requests per day
  Standard plan:   10,000 requests per day
  Premium plan:    100,000 requests per day
  Enterprise:      Custom limits

HTTP Response Headers for Rate Limits

Communicate rate limit status to legitimate clients via headers:

X-RateLimit-Limit: 100
  → Total requests allowed in the current window

X-RateLimit-Remaining: 73
  → Requests remaining in current window

X-RateLimit-Reset: 1699901400
  → Unix timestamp when the window resets

Retry-After: 30
  → Seconds until the client can try again (sent with 429 responses)

When limit is exceeded, return:
  HTTP 429 Too Many Requests
  Retry-After: 60
  { "error": "rate_limit_exceeded",
    "message": "Too many requests. Please try again in 60 seconds.",
    "retry_after": 60 }

Well-behaved clients use these headers to implement backoff.

Implementing Rate Limiting

Option 1: API Gateway Level (Recommended)
  AWS API Gateway, Kong, Nginx, Cloudflare, Apigee
  Rate limiting handled before requests reach your application.
  No application code changes needed.
  Centralised management.

  Nginx rate limiting:
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req zone=api burst=20 nodelay;
    limit_req_status 429;

Option 2: Redis-Based (Application Level)
  Store request counts in Redis with TTL.
  Works across multiple application server instances.

  Python example (Redis sliding window):
  def is_rate_limited(user_id, limit=100, window=60):
      key = f"rate:{user_id}"
      pipe = redis.pipeline()
      now = time.time()
      pipe.zremrangebyscore(key, 0, now - window)
      pipe.zadd(key, {str(now): now})
      pipe.zcard(key)
      pipe.expire(key, window)
      results = pipe.execute()
      request_count = results[2]
      return request_count > limit

Option 3: In-Memory (Single Server Only)
  Simple counter per IP/user stored in application memory.
  Does NOT work correctly if you have multiple server instances.
  Only suitable for single-server development environments.

Throttling vs Rate Limiting

Rate Limiting: Hard stop — blocked requests return 429.
  Client: "Give me data."
  Server: "You have exceeded your limit. Come back in 60 seconds."
  Use when: Protecting system stability, enforcing paid tiers.

Throttling: Slow down — requests are delayed, not rejected.
  Client: "Give me 1000 requests."
  Server: Processes them slowly, adding artificial delay.
  Client sees slow responses but no errors.
  Use when: Degrading gracefully, protecting from burst abuse
  without completely blocking the client.

Progressive Throttling (Combining Both):
  Requests 1-100:  Normal speed (0ms delay)
  Requests 101-150: Throttled (500ms delay per request)
  Requests 151+:  Rate limited (429 error)

Detecting Specific Abuse Patterns

Credential Stuffing Detection:
  Normal login failure rate: < 5% of login attempts
  Attack signature: > 30% failure rate from single IP or user-agent
  Response: Progressive delays, CAPTCHA challenge, IP block

OTP Brute Force Detection:
  OTPs are 4-6 digits (10,000 - 1,000,000 possibilities).
  Normal user tries 1-3 times.
  Attack: Rapid sequential attempts.
  Response: Lock OTP after 3-5 wrong attempts. Generate new OTP.

Content Scraping Detection:
  Normal user: views 5-10 pages per minute, with varied timing.
  Scraper: requests at machine speed (100+ per second), sequential IDs,
           no images or CSS loaded (API-only calls), no JavaScript execution.
  Response: Require JavaScript challenge, CAPTCHAs, robot fingerprinting.

API Key Enumeration:
  Attacker tests many random API key values.
  Signature: High 401 rate from same IP, rapid requests.
  Response: Rate limit 401 responses. After N consecutive failures,
            block the IP temporarily.

CAPTCHA Integration for High-Risk Endpoints

For endpoints where human verification adds meaningful security:
  POST /api/login
  POST /api/register
  POST /api/forgot-password
  POST /api/contact

Options:
  Google reCAPTCHA v3: Invisible. Scores user behavior 0.0-1.0.
  hCaptcha: Privacy-focused alternative to reCAPTCHA.
  Cloudflare Turnstile: Invisible challenge, no image puzzles.

Implementation flow:
  1. Client solves CAPTCHA challenge in browser.
  2. Client receives CAPTCHA token.
  3. Client sends token with API request.
  4. Server verifies token with CAPTCHA provider's API.
  5. If verification score is too low → reject request.

CAPTCHA on login only applies to web/browser clients.
For mobile apps and M2M: use rate limiting, device fingerprinting,
and behavioral analytics instead.

Key Points

  • Rate limiting controls how many requests a client can make per time window. Without it, APIs are open to DoS, credential stuffing, scraping, and brute force.
  • The token bucket strategy is the most flexible: allows bursts up to a limit, then enforces a refill rate.
  • Apply rate limits across multiple dimensions: by IP, by user, by endpoint, and by account tier.
  • Return HTTP 429 with a Retry-After header so legitimate clients know when to retry.
  • Implement rate limiting at the API gateway level for centralized, scalable enforcement.
  • Use Redis for distributed rate limiting across multiple application server instances — in-memory counters only work on single servers.

Leave a Comment