API Security Excessive Data Exposure
Excessive data exposure happens when an API returns more data than the client actually needs. The server sends full database objects without filtering out sensitive fields, trusting the client application to display only what is appropriate. Attackers bypass the client entirely and read everything the API sends.
How Excessive Data Exposure Happens
The Root Cause:
Developer mindset (lazy but common):
"The mobile app only shows the user's name and email.
Let me just return the whole user object from the database.
The app will just display the fields it needs."
Server code:
router.get('/api/users/:id', async (req, res) => {
const user = await db.findUserById(req.params.id);
res.json(user); // Returns EVERYTHING in the database row
});
Database row for user:
{
"id": 101,
"name": "Ananya Kapoor",
"email": "ananya@example.com",
"phone": "+91-9876543210",
"dob": "1990-05-15",
"password_hash": "$2b$12$...", ← Never needed by client
"password_reset_token": "abc123xyz", ← Security token
"credit_card_number": "4111111111111111", ← Never needed
"ssn": "XXX-XX-1234", ← Highly sensitive
"internal_risk_score": 720, ← Internal analytics
"admin_notes": "Flagged for review", ← Confidential
"full_address": "...", ← Privacy concern
"stripe_customer_id": "cus_abc123", ← Third-party account
"is_admin": false,
"created_at": "2022-03-10"
}
Mobile app displays: name, email
Attacker receives: EVERYTHING above
The Attack Is Simple
No exploitation needed — just call the API and read the response. 1. Attacker intercepts traffic using browser developer tools or Burp Suite proxy. 2. Sees the full JSON response with all fields. 3. Notes fields the app does not display: password_hash, credit_card_number, ssn, admin_notes... 4. Collects this data for every user account they can access. There is no injection, no bypass, no clever technique. The API simply gives away data it should never have sent.
Real-World Excessive Data Exposure Incidents
Case 1: Venmo Public API Venmo's API returned full transaction objects. Transactions set to "public" included sender name, recipient name, transaction description, amounts, and timestamps. Researchers scraped 207 million transactions using the API. Data revealed: who paid whom, for what, political donations, drug references, personal relationships. No hacking required — just systematic API calls. Case 2: Social Media API User Data Multiple social media platforms exposed full user objects through friend-list or mention lookup APIs. Fields included email addresses, phone numbers, and dates of birth that users never intended to make searchable. Attackers collected these via automated bulk queries. Case 3: Healthcare API Data Leak A healthcare provider's API returned full patient records including medical diagnoses, prescription details, and insurance information. The mobile app only showed appointment dates and doctor names. Anyone who intercepted API traffic received the complete medical record.
Sensitive Field Categories to Filter
Always filter these categories from API responses: Authentication Data: password, password_hash, salt password_reset_token, email_verification_token mfa_secret, backup_codes session_tokens, api_keys Financial Data: full_credit_card_number (return only last 4 digits max) bank_account_number (return only masked version) cvv, pin full_routing_number Government IDs: ssn, national_id, passport_number, aadhaar_number (if needed, return only last 4 digits) Internal Business Data: internal_scores, risk_ratings, fraud_flags admin_notes, support_tickets (for non-support contexts) cost_price (return only retail price to customers) third_party_ids (Stripe IDs, internal tracking IDs) created_by_ip, registration_ip Other Users' Data: Other users' emails, phones, addresses Other users' account details returned in relationship queries
The Right Fix: Response Shaping
Never return raw database objects. Always shape the response to include only the fields appropriate for the requesting user and context.
Three approaches to response shaping:
Approach 1: Explicit field selection (simple, effective)
Instead of: res.json(user)
Use:
res.json({
id: user.id,
name: user.name,
email: user.email
});
Explicitly list only what the client should receive.
Adding a new database column never accidentally exposes it.
Approach 2: Serialization layer / DTOs (Data Transfer Objects)
Define a separate "view model" or DTO class for each response context.
UserPublicView: id, name, avatar_url
UserProfileView: id, name, email, phone, created_at
UserAdminView: id, name, email, phone, created_at, role, flags, notes
Each view explicitly includes only appropriate fields.
Use the correct view based on the requesting user's role.
Approach 3: Field masking
Return the field but obscure the sensitive part.
credit_card: "4111 **** **** 1111" (show only last 4)
phone: "+91 ***** 43210" (show only last 5)
email: "a****@example.com" (show only first char + domain)
Useful when the user needs to confirm their data without
exposing it fully to the network.
Context-Aware Response Shaping
The same resource should return different data to different roles.
GET /api/users/101 accessed by different roles:
Regular user accessing their OWN profile:
{
"id": 101,
"name": "Ananya",
"email": "ananya@example.com",
"phone": "+91-98*****210",
"created_at": "2022-03-10"
}
Admin accessing any user's profile:
{
"id": 101,
"name": "Ananya",
"email": "ananya@example.com",
"phone": "+91-9876543210", ← Full phone for admin
"created_at": "2022-03-10",
"internal_risk_score": 720, ← Visible to admin only
"admin_notes": "...", ← Visible to admin only
"is_active": true
}
Another regular user accessing Ananya's profile:
{
"id": 101,
"name": "Ananya",
"avatar_url": "https://..." ← Public info only
}
Excessive Data Exposure in Lists and Paginated Results
Single object leaks are bad. Bulk list endpoint leaks are catastrophic. GET /api/admin/users?page=1&limit=100 If this returns 100 full user objects with all database fields: 100 users per page × N pages = All user PII in a few requests. Attacker paginates through all pages and collects the entire database. Fix: 1. Apply field filtering to list responses (same as single objects) 2. Implement authorization: only admin can call this endpoint 3. Implement rate limiting on list endpoints 4. Limit maximum page size (e.g., max 50 per page) 5. Log bulk data access and alert on unusual patterns
GraphQL and Excessive Data Exposure
GraphQL lets clients specify exactly which fields they want.
This design was meant to prevent over-fetching.
But it does not prevent exposure if field-level authorization is missing.
A GraphQL type with all fields defined:
type User {
id: ID!
name: String!
email: String!
password_hash: String ← Should be excluded from schema entirely
internal_score: Int ← Should require admin role to query
admin_notes: String ← Should require admin role to query
}
If an attacker knows field names (via introspection):
{ user(id: 101) { name password_hash internal_score admin_notes } }
→ Returns everything asked for if field-level auth is missing.
Fix for GraphQL:
1. Exclude sensitive fields from the schema entirely if not needed
2. Implement field-level resolvers with authorization checks
3. Disable introspection in production to hide schema structure
API Documentation and Accidental Disclosure
Swagger/OpenAPI documentation often shows example responses.
If these examples include real data or reveal sensitive field names,
attackers learn what to look for even before calling the API.
Dangerous example in Swagger docs:
Response example:
{
"id": 101,
"name": "Test User",
"password_hash": "$2b$12$...", ← Should not be in examples
"stripe_customer_id": "cus_abc" ← Reveals third-party integration
}
Fix: Review all API documentation examples.
Remove sensitive fields from example responses.
Better: auto-generate documentation from filtered DTOs.
Key Points
- Excessive data exposure happens when APIs return full database objects without filtering, trusting clients to display only what is needed.
- Attackers bypass client applications and read the raw API response, seeing all fields including sensitive ones the app never displays.
- Always shape responses explicitly — list only the fields that should be returned for each endpoint and role combination.
- Use DTOs or view models to define separate response structures for different roles (public, self, admin).
- GraphQL requires field-level authorization. Defining a field in the schema does not automatically protect it from unauthorized access.
- Review API documentation examples for sensitive field names and remove any that reveal internal data structures.
