API Security Excessive Data Exposure

Excessive data exposure happens when an API returns more data than the client actually needs. The server sends full database objects without filtering out sensitive fields, trusting the client application to display only what is appropriate. Attackers bypass the client entirely and read everything the API sends.

How Excessive Data Exposure Happens

The Root Cause:

Developer mindset (lazy but common):
  "The mobile app only shows the user's name and email.
   Let me just return the whole user object from the database.
   The app will just display the fields it needs."

Server code:
  router.get('/api/users/:id', async (req, res) => {
    const user = await db.findUserById(req.params.id);
    res.json(user);   // Returns EVERYTHING in the database row
  });

Database row for user:
{
  "id": 101,
  "name": "Ananya Kapoor",
  "email": "ananya@example.com",
  "phone": "+91-9876543210",
  "dob": "1990-05-15",
  "password_hash": "$2b$12$...",          ← Never needed by client
  "password_reset_token": "abc123xyz",    ← Security token
  "credit_card_number": "4111111111111111", ← Never needed
  "ssn": "XXX-XX-1234",                   ← Highly sensitive
  "internal_risk_score": 720,             ← Internal analytics
  "admin_notes": "Flagged for review",    ← Confidential
  "full_address": "...",                  ← Privacy concern
  "stripe_customer_id": "cus_abc123",    ← Third-party account
  "is_admin": false,
  "created_at": "2022-03-10"
}

Mobile app displays: name, email
Attacker receives: EVERYTHING above

The Attack Is Simple

No exploitation needed — just call the API and read the response.

1. Attacker intercepts traffic using browser developer tools
   or Burp Suite proxy.

2. Sees the full JSON response with all fields.

3. Notes fields the app does not display:
   password_hash, credit_card_number, ssn, admin_notes...

4. Collects this data for every user account they can access.

There is no injection, no bypass, no clever technique.
The API simply gives away data it should never have sent.

Real-World Excessive Data Exposure Incidents

Case 1: Venmo Public API
  Venmo's API returned full transaction objects.
  Transactions set to "public" included sender name, recipient name,
  transaction description, amounts, and timestamps.
  Researchers scraped 207 million transactions using the API.
  Data revealed: who paid whom, for what, political donations,
  drug references, personal relationships.
  No hacking required — just systematic API calls.

Case 2: Social Media API User Data
  Multiple social media platforms exposed full user objects through
  friend-list or mention lookup APIs.
  Fields included email addresses, phone numbers, and dates of birth
  that users never intended to make searchable.
  Attackers collected these via automated bulk queries.

Case 3: Healthcare API Data Leak
  A healthcare provider's API returned full patient records including
  medical diagnoses, prescription details, and insurance information.
  The mobile app only showed appointment dates and doctor names.
  Anyone who intercepted API traffic received the complete medical record.

Sensitive Field Categories to Filter

Always filter these categories from API responses:

Authentication Data:
  password, password_hash, salt
  password_reset_token, email_verification_token
  mfa_secret, backup_codes
  session_tokens, api_keys

Financial Data:
  full_credit_card_number (return only last 4 digits max)
  bank_account_number (return only masked version)
  cvv, pin
  full_routing_number

Government IDs:
  ssn, national_id, passport_number, aadhaar_number
  (if needed, return only last 4 digits)

Internal Business Data:
  internal_scores, risk_ratings, fraud_flags
  admin_notes, support_tickets (for non-support contexts)
  cost_price (return only retail price to customers)
  third_party_ids (Stripe IDs, internal tracking IDs)
  created_by_ip, registration_ip

Other Users' Data:
  Other users' emails, phones, addresses
  Other users' account details returned in relationship queries

The Right Fix: Response Shaping

Never return raw database objects. Always shape the response to include only the fields appropriate for the requesting user and context.

Three approaches to response shaping:

Approach 1: Explicit field selection (simple, effective)
  Instead of: res.json(user)
  Use:
  res.json({
    id:    user.id,
    name:  user.name,
    email: user.email
  });
  
  Explicitly list only what the client should receive.
  Adding a new database column never accidentally exposes it.

Approach 2: Serialization layer / DTOs (Data Transfer Objects)
  Define a separate "view model" or DTO class for each response context.
  
  UserPublicView:   id, name, avatar_url
  UserProfileView:  id, name, email, phone, created_at
  UserAdminView:    id, name, email, phone, created_at, role, flags, notes
  
  Each view explicitly includes only appropriate fields.
  Use the correct view based on the requesting user's role.

Approach 3: Field masking
  Return the field but obscure the sensitive part.
  
  credit_card: "4111 **** **** 1111" (show only last 4)
  phone:       "+91 ***** 43210"     (show only last 5)
  email:       "a****@example.com"   (show only first char + domain)
  
  Useful when the user needs to confirm their data without
  exposing it fully to the network.

Context-Aware Response Shaping

The same resource should return different data to different roles.

GET /api/users/101 accessed by different roles:

Regular user accessing their OWN profile:
{
  "id": 101,
  "name": "Ananya",
  "email": "ananya@example.com",
  "phone": "+91-98*****210",
  "created_at": "2022-03-10"
}

Admin accessing any user's profile:
{
  "id": 101,
  "name": "Ananya",
  "email": "ananya@example.com",
  "phone": "+91-9876543210",     ← Full phone for admin
  "created_at": "2022-03-10",
  "internal_risk_score": 720,   ← Visible to admin only
  "admin_notes": "...",         ← Visible to admin only
  "is_active": true
}

Another regular user accessing Ananya's profile:
{
  "id": 101,
  "name": "Ananya",
  "avatar_url": "https://..."   ← Public info only
}

Excessive Data Exposure in Lists and Paginated Results

Single object leaks are bad. Bulk list endpoint leaks are catastrophic.

GET /api/admin/users?page=1&limit=100

If this returns 100 full user objects with all database fields:
  100 users per page × N pages = All user PII in a few requests.
  
  Attacker paginates through all pages and collects the entire database.

Fix:
  1. Apply field filtering to list responses (same as single objects)
  2. Implement authorization: only admin can call this endpoint
  3. Implement rate limiting on list endpoints
  4. Limit maximum page size (e.g., max 50 per page)
  5. Log bulk data access and alert on unusual patterns

GraphQL and Excessive Data Exposure

GraphQL lets clients specify exactly which fields they want.
This design was meant to prevent over-fetching.
But it does not prevent exposure if field-level authorization is missing.

A GraphQL type with all fields defined:
  type User {
    id: ID!
    name: String!
    email: String!
    password_hash: String   ← Should be excluded from schema entirely
    internal_score: Int     ← Should require admin role to query
    admin_notes: String     ← Should require admin role to query
  }

If an attacker knows field names (via introspection):
  { user(id: 101) { name password_hash internal_score admin_notes } }
  → Returns everything asked for if field-level auth is missing.

Fix for GraphQL:
  1. Exclude sensitive fields from the schema entirely if not needed
  2. Implement field-level resolvers with authorization checks
  3. Disable introspection in production to hide schema structure

API Documentation and Accidental Disclosure

Swagger/OpenAPI documentation often shows example responses.
If these examples include real data or reveal sensitive field names,
attackers learn what to look for even before calling the API.

Dangerous example in Swagger docs:
  Response example:
  {
    "id": 101,
    "name": "Test User",
    "password_hash": "$2b$12$...",   ← Should not be in examples
    "stripe_customer_id": "cus_abc" ← Reveals third-party integration
  }

Fix: Review all API documentation examples.
Remove sensitive fields from example responses.
Better: auto-generate documentation from filtered DTOs.

Key Points

Excessive data exposure happens when APIs return full database objects without filtering, trusting clients to display only what is needed.
Attackers bypass client applications and read the raw API response, seeing all fields including sensitive ones the app never displays.
Always shape responses explicitly — list only the fields that should be returned for each endpoint and role combination.
Use DTOs or view models to define separate response structures for different roles (public, self, admin).
GraphQL requires field-level authorization. Defining a field in the schema does not automatically protect it from unauthorized access.
Review API documentation examples for sensitive field names and remove any that reveal internal data structures.

Previous lesson

Back to course

Next lesson