API Security Input Validation

Every piece of data that enters your API from the outside world is untrusted. A user can send anything — not just what your application expects. Without strict input validation, attackers craft malicious input that breaks your logic, corrupts your data, crashes your server, or executes malicious commands.

Input validation is the practice of checking all incoming data before processing it. It is one of the most fundamental and impactful security controls in API development.

The Trust Boundary

A trust boundary is the line between data you control and data you do not. Everything inside the server — your database, your code, your configuration — is within the trust boundary. Everything coming from outside — user input, API parameters, uploaded files, headers from clients — is outside the trust boundary and must be treated as potentially hostile.

Trust Boundary Diagram:

OUTSIDE (Untrusted Zone)         │  INSIDE (Trusted Zone)
─────────────────────────────────┼─────────────────────────────
Mobile app input                 │  Database queries
Browser form data                │  Business logic
API parameters                   │  Internal services
File uploads                     │  Configuration
HTTP headers from clients        │  Server memory
Third-party webhook data         │  Authenticated session data
                                 │
           ┌─────────────────────┤
           │  VALIDATION LAYER   │
           │  Check everything   │
           │  before it crosses  │
           └─────────────────────┘

Nothing from the untrusted zone should reach the trusted zone
without being validated and sanitized.

What Can Go Wrong Without Validation

Scenario: An API endpoint that looks up a product by ID

Expected input: product_id = 42 (a positive integer)

Without validation, attacker sends:
  product_id = "1; DROP TABLE products--"    → SQL Injection
  product_id = "../../etc/passwd"            → Path traversal
  product_id = "<script>alert(1)</script>"   → XSS injection
  product_id = -1                            → Negative ID may break logic
  product_id = 999999999999999999999         → Integer overflow
  product_id = ""                            → Empty value causes null error
  product_id = [1, 2, 3, 4, 5, 6... ×100k]  → DoS via oversized array

Each of these causes a different type of failure.
Validation catches all of them before any processing occurs.

The Five Dimensions of Input Validation

Dimension 1: Type Validation

Check that the value is the right data type. A field expecting a number should never receive a string. A field expecting a boolean should never receive an object.

Type Validation Examples:

Field: user_age
  Expected: integer
  Received: "twenty-five"  → Reject: not an integer
  Received: 25.5           → Reject: not a whole number
  Received: 25             → Accept

Field: is_active
  Expected: boolean (true or false)
  Received: "yes"          → Reject: not a boolean
  Received: 1              → May accept (depends on strictness)
  Received: true           → Accept

Field: order_date
  Expected: ISO 8601 date string (YYYY-MM-DD)
  Received: "March 15 2024" → Reject: wrong format
  Received: "2024-03-15"    → Accept (then parse and validate further)

Dimension 2: Range and Length Validation

Values may be the right type but fall outside acceptable ranges. Set explicit minimum and maximum values for every field.

Range and Length Examples:

Field: quantity (in an order)
  Type: integer
  Min: 1 (cannot order 0 or negative items)
  Max: 1000 (business rule: max 1000 units per order)
  Received: 0      → Reject (below minimum)
  Received: 9999   → Reject (above maximum)
  Received: 5      → Accept

Field: username
  Type: string
  Min length: 3 characters
  Max length: 30 characters
  Received: "ab"                  → Reject (too short)
  Received: "a" × 10,000         → Reject (too long, possible DoS)
  Received: "meera_dev"          → Accept

Field: rating
  Type: integer
  Min: 1, Max: 5
  Received: 0   → Reject
  Received: 6   → Reject
  Received: 3   → Accept

Dimension 3: Format Validation

Strings may be the right type and length but in the wrong format. Use regular expressions or format parsers to verify structure.

Format Validation Examples:

Field: email
  Pattern: must contain @ and a valid domain
  Received: "not-an-email"       → Reject
  Received: "user@domain.com"    → Accept

Field: phone_number (India)
  Pattern: +91 followed by 10 digits
  Received: "12345"              → Reject (too short)
  Received: "+91-9876543210"     → Reject (wrong format)
  Received: "+919876543210"      → Accept

Field: product_code
  Pattern: 3 uppercase letters + 4 digits (e.g., ABC1234)
  Received: "abc1234"    → Reject (lowercase)
  Received: "AB12345"    → Reject (wrong structure)
  Received: "ABC1234"    → Accept

Dimension 4: Business Rule Validation

Data may be technically valid but violate business logic. These checks require knowledge of the application context.

Business Rule Validation Examples:

Rule: End date must be after start date
  start_date: 2024-06-01
  end_date:   2024-05-01   → Reject (end before start)
  end_date:   2024-07-01   → Accept

Rule: Cannot order more than available stock
  product_id: 501
  quantity:   100
  In stock:   30           → Reject (insufficient stock)
  quantity:   20           → Accept

Rule: Discount code applies once per user
  User has already used code "SUMMER20"
  Request includes code "SUMMER20"  → Reject (already used)

Dimension 5: Allowlist vs Denylist Validation

Allowlisting specifies exactly what is permitted. Denylisting specifies what is forbidden. Allowlisting is significantly stronger because it rejects anything not explicitly expected. Denylisting fails when attackers find new harmful inputs not on the list.

Allowlist vs Denylist Comparison:

Denylist approach (weak):
  Allowed: any string except those containing: ' " ; -- DROP SELECT
  Problem: Attacker uses: ' ʼ ；ˀ (lookalike Unicode characters)
           Or URL encoding: %27 %3B
           Or alternate SQL syntax not in the list

Allowlist approach (strong):
  Allowed: only alphanumeric characters and spaces [A-Za-z0-9 ]
  Problem: Legitimate inputs with special characters are also blocked
  Solution: Define the exact character set needed for each field

Example – product search query:
  Allowlist: letters, numbers, spaces, hyphens only
  "Blue t-shirt size M"   → Accept
  "t-shirt' OR '1'='1"   → Reject (contains ' character)

Where to Validate: Client-Side vs Server-Side

Client-Side Validation:
  Runs in the user's browser or mobile app.
  Purpose: Improve user experience (instant feedback).
  Security value: ZERO for security purposes.

  Why it provides no security:
  An attacker can bypass client-side validation entirely.
  They use tools like Postman, Burp Suite, or curl to send
  requests directly to the API, bypassing the client entirely.

  POST /api/order with raw HTTP:
  { "quantity": -999999, "product_id": "'; DROP TABLE--" }
  No browser validation runs. The server receives it directly.

Server-Side Validation:
  Runs on the server before any processing.
  Purpose: Security and data integrity.
  Security value: ESSENTIAL.

  Always validate on the server, regardless of client-side validation.
  Client-side validation is a UX convenience, not a security control.

Schema Validation

Schema validation checks the entire structure of a request body against a defined contract. Instead of validating each field individually, schema validation checks all fields at once.

JSON Schema for a "Create Order" endpoint:

{
  "type": "object",
  "required": ["product_id", "quantity", "shipping_address"],
  "additionalProperties": false,        ← Reject unknown fields
  "properties": {
    "product_id": {
      "type": "integer",
      "minimum": 1
    },
    "quantity": {
      "type": "integer",
      "minimum": 1,
      "maximum": 100
    },
    "shipping_address": {
      "type": "object",
      "required": ["street", "city", "pincode"],
      "properties": {
        "street": { "type": "string", "maxLength": 200 },
        "city":   { "type": "string", "maxLength": 100 },
        "pincode":{ "type": "string", "pattern": "^[1-9][0-9]{5}$" }
      }
    }
  }
}

Benefits of schema validation:
  Rejects extra fields attackers might inject
  Enforces types, ranges, and formats in one pass
  Self-documents API expectations
  Easy to test and maintain

Input Sanitization vs Validation

Validation checks whether input is acceptable. Sanitization modifies input to make it safe. Both are needed but for different purposes.

Validation (Accept or Reject):
  Input: "Hello World"
  Check: Is this 50 chars or fewer? Yes.
  Check: Does it contain only letters and spaces? Yes.
  Decision: Accept → use as-is.

  Input: "Hello<script>"
  Check: Does it contain only letters and spaces? No.
  Decision: Reject → return 400 Bad Request.

Sanitization (Transform to make safe):
  Context: Displaying user-submitted text in HTML
  Input: "Hello <World>"
  Sanitize: Convert < to &lt; and > to &gt;
  Output: "Hello &lt;World&gt;" (displays as text, not HTML)

  Context: Inserting into SQL (use parameterized queries instead)
  Input: "user@example.com"
  Parameterized: query("SELECT * FROM users WHERE email = ?", [email])
  Never: "SELECT * FROM users WHERE email = '" + email + "'"

Important: Sanitization is not a substitute for parameterized queries
in SQL. Always use parameterized queries for database operations.

Special Input Validation Cases

File Upload Validation

File uploads require multiple validation layers:

Layer 1: File size limit
  Reject files larger than necessary (e.g., max 5 MB for profile photos)

Layer 2: File type validation
  Check MIME type from Content-Type header (can be spoofed)
  AND check the file's magic bytes (actual file header)
  AND check the file extension
  All three should agree

Layer 3: Content inspection
  For images: try to re-render using an image library
  Malicious files disguised as images will fail rendering

Layer 4: Storage location
  Store uploaded files outside the web root
  Never serve uploaded files directly with executable permissions
  Use cloud storage (S3, GCS) with separate serving domain

Layer 5: Filename sanitization
  Never use user-supplied filenames directly
  Generate random names: save as uuid4 + extension
  "../../etc/passwd" as a filename must never reach the filesystem

Unicode and Encoding Attacks

Attackers use encoding tricks to bypass simple string checks:

URL encoding:
  "admin" can be: %61%64%6D%69%6E
  Server decodes before processing → same string after decode

Unicode normalization:
  "admin" can use lookalike Unicode chars:
  ɑdmin (ɑ = Unicode character 0x0251, looks like 'a')
  аdmin (а = Cyrillic 'a', looks identical to Latin 'a')

Double encoding:
  "%" encoded as "%25" → "%25" → "%25" decoded → "%"
  "/api/admin" can become "%2Fapi%2Fadmin"

Fix: Normalize and decode all input BEFORE validation.
    Never validate on encoded forms.

API Request Validation with OpenAPI Specification

OpenAPI (formerly Swagger) lets you define your API's expected inputs formally. Middleware libraries can automatically validate every incoming request against this specification, rejecting any request that does not conform — before your business logic ever sees it.

OpenAPI Validation Flow:

Incoming Request
  ↓
OpenAPI Validation Middleware
  Checks: method allowed? path exists?
  Checks: required parameters present?
  Checks: parameter types match schema?
  Checks: body matches defined schema?
  Invalid → Return 400 Bad Request immediately
  Valid → Pass to business logic handler
  ↓
Business Logic
  Receives only validated, conforming input.

Key Points

Never trust any data from outside the server. Everything from clients is untrusted until validated.
Input validation has five dimensions: type, range/length, format, business rules, and allowlist/denylist.
Client-side validation is a UX feature only — it provides zero security. All security validation must happen server-side.
Allowlisting (specifying what is allowed) is stronger than denylisting (specifying what is forbidden).
Schema validation checks the entire request structure at once, rejects unknown fields, and enforces all constraints in one pass.
Sanitize output for the context where data will be used (HTML, SQL, filesystem) using the correct escaping for that context.
File uploads need multi-layer validation: size, type (magic bytes), content inspection, and secure storage with generated filenames.

Previous lesson

Back to course

Next lesson