API Security Input Validation
Every piece of data that enters your API from the outside world is untrusted. A user can send anything — not just what your application expects. Without strict input validation, attackers craft malicious input that breaks your logic, corrupts your data, crashes your server, or executes malicious commands.
Input validation is the practice of checking all incoming data before processing it. It is one of the most fundamental and impactful security controls in API development.
The Trust Boundary
A trust boundary is the line between data you control and data you do not. Everything inside the server — your database, your code, your configuration — is within the trust boundary. Everything coming from outside — user input, API parameters, uploaded files, headers from clients — is outside the trust boundary and must be treated as potentially hostile.
Trust Boundary Diagram:
OUTSIDE (Untrusted Zone) │ INSIDE (Trusted Zone)
─────────────────────────────────┼─────────────────────────────
Mobile app input │ Database queries
Browser form data │ Business logic
API parameters │ Internal services
File uploads │ Configuration
HTTP headers from clients │ Server memory
Third-party webhook data │ Authenticated session data
│
┌─────────────────────┤
│ VALIDATION LAYER │
│ Check everything │
│ before it crosses │
└─────────────────────┘
Nothing from the untrusted zone should reach the trusted zone
without being validated and sanitized.
What Can Go Wrong Without Validation
Scenario: An API endpoint that looks up a product by ID Expected input: product_id = 42 (a positive integer) Without validation, attacker sends: product_id = "1; DROP TABLE products--" → SQL Injection product_id = "../../etc/passwd" → Path traversal product_id = "<script>alert(1)</script>" → XSS injection product_id = -1 → Negative ID may break logic product_id = 999999999999999999999 → Integer overflow product_id = "" → Empty value causes null error product_id = [1, 2, 3, 4, 5, 6... ×100k] → DoS via oversized array Each of these causes a different type of failure. Validation catches all of them before any processing occurs.
The Five Dimensions of Input Validation
Dimension 1: Type Validation
Check that the value is the right data type. A field expecting a number should never receive a string. A field expecting a boolean should never receive an object.
Type Validation Examples: Field: user_age Expected: integer Received: "twenty-five" → Reject: not an integer Received: 25.5 → Reject: not a whole number Received: 25 → Accept Field: is_active Expected: boolean (true or false) Received: "yes" → Reject: not a boolean Received: 1 → May accept (depends on strictness) Received: true → Accept Field: order_date Expected: ISO 8601 date string (YYYY-MM-DD) Received: "March 15 2024" → Reject: wrong format Received: "2024-03-15" → Accept (then parse and validate further)
Dimension 2: Range and Length Validation
Values may be the right type but fall outside acceptable ranges. Set explicit minimum and maximum values for every field.
Range and Length Examples: Field: quantity (in an order) Type: integer Min: 1 (cannot order 0 or negative items) Max: 1000 (business rule: max 1000 units per order) Received: 0 → Reject (below minimum) Received: 9999 → Reject (above maximum) Received: 5 → Accept Field: username Type: string Min length: 3 characters Max length: 30 characters Received: "ab" → Reject (too short) Received: "a" × 10,000 → Reject (too long, possible DoS) Received: "meera_dev" → Accept Field: rating Type: integer Min: 1, Max: 5 Received: 0 → Reject Received: 6 → Reject Received: 3 → Accept
Dimension 3: Format Validation
Strings may be the right type and length but in the wrong format. Use regular expressions or format parsers to verify structure.
Format Validation Examples: Field: email Pattern: must contain @ and a valid domain Received: "not-an-email" → Reject Received: "user@domain.com" → Accept Field: phone_number (India) Pattern: +91 followed by 10 digits Received: "12345" → Reject (too short) Received: "+91-9876543210" → Reject (wrong format) Received: "+919876543210" → Accept Field: product_code Pattern: 3 uppercase letters + 4 digits (e.g., ABC1234) Received: "abc1234" → Reject (lowercase) Received: "AB12345" → Reject (wrong structure) Received: "ABC1234" → Accept
Dimension 4: Business Rule Validation
Data may be technically valid but violate business logic. These checks require knowledge of the application context.
Business Rule Validation Examples: Rule: End date must be after start date start_date: 2024-06-01 end_date: 2024-05-01 → Reject (end before start) end_date: 2024-07-01 → Accept Rule: Cannot order more than available stock product_id: 501 quantity: 100 In stock: 30 → Reject (insufficient stock) quantity: 20 → Accept Rule: Discount code applies once per user User has already used code "SUMMER20" Request includes code "SUMMER20" → Reject (already used)
Dimension 5: Allowlist vs Denylist Validation
Allowlisting specifies exactly what is permitted. Denylisting specifies what is forbidden. Allowlisting is significantly stronger because it rejects anything not explicitly expected. Denylisting fails when attackers find new harmful inputs not on the list.
Allowlist vs Denylist Comparison:
Denylist approach (weak):
Allowed: any string except those containing: ' " ; -- DROP SELECT
Problem: Attacker uses: ' ʼ ;ˀ (lookalike Unicode characters)
Or URL encoding: %27 %3B
Or alternate SQL syntax not in the list
Allowlist approach (strong):
Allowed: only alphanumeric characters and spaces [A-Za-z0-9 ]
Problem: Legitimate inputs with special characters are also blocked
Solution: Define the exact character set needed for each field
Example – product search query:
Allowlist: letters, numbers, spaces, hyphens only
"Blue t-shirt size M" → Accept
"t-shirt' OR '1'='1" → Reject (contains ' character)
Where to Validate: Client-Side vs Server-Side
Client-Side Validation:
Runs in the user's browser or mobile app.
Purpose: Improve user experience (instant feedback).
Security value: ZERO for security purposes.
Why it provides no security:
An attacker can bypass client-side validation entirely.
They use tools like Postman, Burp Suite, or curl to send
requests directly to the API, bypassing the client entirely.
POST /api/order with raw HTTP:
{ "quantity": -999999, "product_id": "'; DROP TABLE--" }
No browser validation runs. The server receives it directly.
Server-Side Validation:
Runs on the server before any processing.
Purpose: Security and data integrity.
Security value: ESSENTIAL.
Always validate on the server, regardless of client-side validation.
Client-side validation is a UX convenience, not a security control.
Schema Validation
Schema validation checks the entire structure of a request body against a defined contract. Instead of validating each field individually, schema validation checks all fields at once.
JSON Schema for a "Create Order" endpoint:
{
"type": "object",
"required": ["product_id", "quantity", "shipping_address"],
"additionalProperties": false, ← Reject unknown fields
"properties": {
"product_id": {
"type": "integer",
"minimum": 1
},
"quantity": {
"type": "integer",
"minimum": 1,
"maximum": 100
},
"shipping_address": {
"type": "object",
"required": ["street", "city", "pincode"],
"properties": {
"street": { "type": "string", "maxLength": 200 },
"city": { "type": "string", "maxLength": 100 },
"pincode":{ "type": "string", "pattern": "^[1-9][0-9]{5}$" }
}
}
}
}
Benefits of schema validation:
Rejects extra fields attackers might inject
Enforces types, ranges, and formats in one pass
Self-documents API expectations
Easy to test and maintain
Input Sanitization vs Validation
Validation checks whether input is acceptable. Sanitization modifies input to make it safe. Both are needed but for different purposes.
Validation (Accept or Reject):
Input: "Hello World"
Check: Is this 50 chars or fewer? Yes.
Check: Does it contain only letters and spaces? Yes.
Decision: Accept → use as-is.
Input: "Hello<script>"
Check: Does it contain only letters and spaces? No.
Decision: Reject → return 400 Bad Request.
Sanitization (Transform to make safe):
Context: Displaying user-submitted text in HTML
Input: "Hello <World>"
Sanitize: Convert < to < and > to >
Output: "Hello <World>" (displays as text, not HTML)
Context: Inserting into SQL (use parameterized queries instead)
Input: "user@example.com"
Parameterized: query("SELECT * FROM users WHERE email = ?", [email])
Never: "SELECT * FROM users WHERE email = '" + email + "'"
Important: Sanitization is not a substitute for parameterized queries
in SQL. Always use parameterized queries for database operations.
Special Input Validation Cases
File Upload Validation
File uploads require multiple validation layers: Layer 1: File size limit Reject files larger than necessary (e.g., max 5 MB for profile photos) Layer 2: File type validation Check MIME type from Content-Type header (can be spoofed) AND check the file's magic bytes (actual file header) AND check the file extension All three should agree Layer 3: Content inspection For images: try to re-render using an image library Malicious files disguised as images will fail rendering Layer 4: Storage location Store uploaded files outside the web root Never serve uploaded files directly with executable permissions Use cloud storage (S3, GCS) with separate serving domain Layer 5: Filename sanitization Never use user-supplied filenames directly Generate random names: save as uuid4 + extension "../../etc/passwd" as a filename must never reach the filesystem
Unicode and Encoding Attacks
Attackers use encoding tricks to bypass simple string checks:
URL encoding:
"admin" can be: %61%64%6D%69%6E
Server decodes before processing → same string after decode
Unicode normalization:
"admin" can use lookalike Unicode chars:
ɑdmin (ɑ = Unicode character 0x0251, looks like 'a')
аdmin (а = Cyrillic 'a', looks identical to Latin 'a')
Double encoding:
"%" encoded as "%25" → "%25" → "%25" decoded → "%"
"/api/admin" can become "%2Fapi%2Fadmin"
Fix: Normalize and decode all input BEFORE validation.
Never validate on encoded forms.
API Request Validation with OpenAPI Specification
OpenAPI (formerly Swagger) lets you define your API's expected inputs formally. Middleware libraries can automatically validate every incoming request against this specification, rejecting any request that does not conform — before your business logic ever sees it.
OpenAPI Validation Flow: Incoming Request ↓ OpenAPI Validation Middleware Checks: method allowed? path exists? Checks: required parameters present? Checks: parameter types match schema? Checks: body matches defined schema? Invalid → Return 400 Bad Request immediately Valid → Pass to business logic handler ↓ Business Logic Receives only validated, conforming input.
Key Points
- Never trust any data from outside the server. Everything from clients is untrusted until validated.
- Input validation has five dimensions: type, range/length, format, business rules, and allowlist/denylist.
- Client-side validation is a UX feature only — it provides zero security. All security validation must happen server-side.
- Allowlisting (specifying what is allowed) is stronger than denylisting (specifying what is forbidden).
- Schema validation checks the entire request structure at once, rejects unknown fields, and enforces all constraints in one pass.
- Sanitize output for the context where data will be used (HTML, SQL, filesystem) using the correct escaping for that context.
- File uploads need multi-layer validation: size, type (magic bytes), content inspection, and secure storage with generated filenames.
