API Security Sensitive Data Handling

Encryption protects data while it moves between a client and server. But sensitive data can also leak in other ways — through logs, caches, error messages, headers, and response bodies. Handling sensitive data correctly means controlling where it appears, how long it lives, and who can access it at every stage of its journey.

What Counts as Sensitive Data

Categories of Sensitive Data in APIs:

Personal Identifiable Information (PII):
  Full name combined with government ID numbers
  Date of birth, home address, phone number
  Biometric data, medical records, financial account details

Authentication Credentials:
  Passwords, password hashes, salts
  API keys, OAuth tokens, session IDs
  OTP codes, backup codes, recovery tokens

Financial Data:
  Full payment card numbers, CVV, expiry dates
  Bank account numbers, IFSC codes
  Transaction amounts and merchant details

Regulated Data (varies by region):
  Aadhaar numbers (India — DPDP Act)
  Health records (India — DPDP, global — HIPAA)
  Children's data (COPPA, GDPR-K)
  Credit information (RBI regulations in India)

Data Leakage Points in an API System

┌─────────────────────────────────────────────────────────┐
│                   API Data Flow                         │
│                                                         │
│  Client Request                                         │
│     → URL parameters  ← Log leakage risk                │
│     → Request headers ← Log leakage risk                │
│     → Request body    ← Log leakage risk if logged      │
│         ↓                                               │
│  API Server                                             │
│     → Application logs ← Sensitive data often here      │
│     → Error messages   ← Stack traces, DB queries       │
│     → Response body    ← Over-exposure risk             │
│         ↓                                               │
│  Infrastructure                                         │
│     → Load balancer logs  ← URL logs                    │
│     → CDN/proxy cache     ← Response caching risk       │
│     → Database            ← Storage of raw sensitive    │
│     → Monitoring tools    ← Metrics may contain PII     │
└─────────────────────────────────────────────────────────┘

Sensitive Data in URLs

Never put sensitive data in URLs. URLs appear in:
  - Server access logs
  - Browser history
  - Referrer headers sent to third-party sites
  - Analytics tools
  - CDN and proxy logs

Wrong:
  GET /api/reset-password?token=abc123secrettoken
  GET /api/users?api_key=sk_live_abc123xyz
  GET /api/search?ssn=123-45-6789

Right:
  POST /api/reset-password
  Body: { "token": "abc123secrettoken", "new_password": "..." }

  Authorization: Bearer sk_live_abc123xyz  (in header, not URL)

  POST /api/identity/verify
  Body: { "ssn": "123-45-6789" }  (in body, still encrypted over HTTPS)

Log Sanitization

Logs are essential for debugging. They are also dangerous if they
contain sensitive data — logs are often stored for months and
accessed by many team members.

What should never appear in logs:
  - Passwords (even failed login attempts should log "password provided"
    not the actual password)
  - Full payment card numbers (last 4 digits maximum)
  - Full government IDs (mask: "XXXXXXXX1234")
  - API keys and tokens (log only first 8 characters + "***")
  - Session IDs and JWTs
  - Personal health information

Log masking example (Node.js):

function sanitizeForLog(requestBody) {
  const masked = { ...requestBody };
  if (masked.password)    masked.password    = '[REDACTED]';
  if (masked.card_number) masked.card_number = '****' + masked.card_number.slice(-4);
  if (masked.api_key)     masked.api_key     = masked.api_key.substring(0,8) + '***';
  if (masked.token)       masked.token       = '[TOKEN_REDACTED]';
  return masked;
}

logger.info('Request received', sanitizeForLog(req.body));

Structured logging frameworks (Logback, Winston, Pino) support
field-level masking policies applied globally.

Caching and Sensitive Responses

HTTP caches (CDN, proxy servers, browser cache) store responses
to serve subsequent requests faster. Sensitive API responses
must never be cached.

Dangerous — response cached with default settings:
  GET /api/users/101/profile
  Response: { "ssn": "...", "medical_records": [...] }
  Stored in CDN cache for 1 hour.
  Next person who makes same request gets cached sensitive response.

Fix — Cache-Control headers on sensitive responses:
  Cache-Control: no-store
    → Never store this response anywhere.
  Cache-Control: no-cache, private
    → May cache but must revalidate with server before each use.
    → "private" means CDN/shared caches must not store it.

Apply no-store to:
  Any endpoint returning personal data
  Authentication endpoints
  Financial transaction data
  Health and medical data
  Any admin or privileged data endpoint

Static resources (CSS, images, public content) can and should
be cached aggressively for performance.

Masking Data in Responses

Even in API responses, sensitive data should be masked unless
the specific use case requires the full value.

Payment card numbers:
  Full number (never return): 4111111111111111
  Masked (return to client):  **** **** **** 1111
  PAN token (for processing): tok_visa_1a2b3c (not real card number)

Phone numbers:
  Full (for admin/owner viewing own profile): +91-9876543210
  Masked (for public profile or list view):   +91-98*****210

Bank account numbers:
  Full (only for owner in high-security context): 123456789012
  Masked (for general display):                  XXXXXXXX9012

Email (for privacy in shared contexts):
  Full (for profile owner): meera@example.com
  Masked (for other users): m****@example.com

Masking at response layer (Python example):
  def mask_card(number):
      return '*' * 12 + number[-4:]

  def format_user_response(user, requesting_user_id):
      return {
          "id": user.id,
          "name": user.name,
          "email": user.email if user.id == requesting_user_id else mask_email(user.email),
          "card": mask_card(user.card_number) if user.card_number else None
      }

Storing Sensitive Data Securely

Passwords — Never store plain text:
  Wrong:  password_column = "MyPassword123"
  Wrong:  password_column = MD5("MyPassword123") (MD5 is broken)
  Right:  password_column = bcrypt("MyPassword123", cost=12)
                         OR argon2id("MyPassword123", memory=65536, iterations=3)

  Bcrypt, Argon2, and scrypt are designed to be slow for brute force.
  MD5, SHA1, SHA256 without salting are fast and reversible via rainbow tables.

Payment Cards — Use tokenization:
  Never store full card numbers.
  Use a payment processor (Stripe, Razorpay, Paytm).
  Store only the payment processor's token reference.
  Processor handles PCI-DSS compliance.

API Keys — Hash them:
  Store SHA-256 hash of the API key, not the key itself.
  When verifying: hash the submitted key and compare to stored hash.
  If database is breached: hashed keys are useless to attacker.
  (Unlike passwords, API keys do not need slow hash — SHA-256 is fine
   because they are randomly generated and not dictionary-guessable.)

Encryption Keys and Secrets:
  Use a dedicated secrets manager:
    AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, GCP Secret Manager
  Never store secrets in:
    Source code, config files committed to Git, environment variables
    printed to logs, hardcoded in mobile apps

Data Minimization in API Design

Data minimization principle: Do not collect or expose data
that is not necessary for the specific function.

Applies to:
  Inputs: Do not accept fields you do not need.
    If your API never uses a user's date of birth, do not collect it.
    
  Outputs: Do not return data that the client function does not need.
    A friend-search API needs name and avatar.
    It does not need email, phone, or account balance.
    
  Retention: Do not keep data longer than necessary.
    Session tokens: expire after idle timeout.
    Password reset tokens: expire after 15-30 minutes.
    Logs: retain for 90 days, then delete or archive encrypted.
    Deleted user data: purge from all systems within regulatory deadline.

Data minimization benefits beyond security:
  Smaller attack surface if breached.
  Reduced regulatory compliance burden.
  Faster API responses (less data to transfer).
  Lower storage and processing costs.

Sensitive Data in Error Responses

Error messages should be helpful but not informative to attackers.

Too much information (dangerous):
  {
    "error": "InvalidCardException",
    "message": "Card number 4111111111111111 failed CVV validation",
    "user_email": "meera@example.com",
    "db_query": "SELECT * FROM payments WHERE card = '4111111111111111'",
    "stack_trace": "at PaymentService.java:142..."
  }

Correct error response (safe):
  {
    "error": "payment_failed",
    "message": "Payment could not be processed. Please check your card details.",
    "request_id": "req_abc123"    ← Use request_id to look up details server-side
  }

The request_id allows developers to correlate the error with
detailed server-side logs without exposing details to the client.

Compliance Considerations

Key regulations affecting API data handling:

India — DPDP Act 2023:
  Personal data must be used only for stated purpose.
  Data must be deleted when no longer needed.
  Users have right to access and deletion.
  Significant financial penalties for violations.

PCI-DSS (Payment Cards):
  Cardholder data must be encrypted at rest and in transit.
  Card numbers must be masked in displays and logs.
  Strict access control to cardholder data environments.
  Regular penetration testing required.

GDPR (EU, applies to any business with EU customers):
  Lawful basis required for data collection.
  Data minimization required.
  Right to erasure (must be able to delete user data from all systems).
  72-hour breach notification requirement.

HIPAA (US Health Data):
  Protected Health Information (PHI) requires specific encryption standards.
  Audit logs required for all access to health data.
  Business Associate Agreements required with API partners.

Key Points

Sensitive data leaks through logs, URLs, caches, error messages, and response bodies — not only through network interception.
Never put sensitive data in URLs. Use request headers or body instead.
Sanitize logs to remove passwords, tokens, card numbers, and government IDs before they are written.
Set Cache-Control: no-store on all responses containing personal or sensitive data.
Mask sensitive fields in responses — show only the minimum needed (last 4 card digits, first letter of email).
Store passwords using bcrypt or Argon2. Never store full payment card numbers — use tokenization through a payment processor.
Collect and return only the data actually needed for each specific function. Data minimization reduces breach impact and regulatory risk.

Previous lesson

Back to course

Next lesson