XML Validation
Writing XML that merely follows syntax rules (well-formed XML) is the first step. But in real-world applications, it is equally important that the XML conforms to a specific structure and follows business rules. This is what XML validation is about — verifying that an XML document matches a predefined structure and set of constraints.
Two Levels of Correctness in XML
Level 1: Well-Formed XML
A well-formed XML document correctly follows all XML syntax rules:
- Has exactly one root element.
- All tags are properly opened and closed.
- Tags are correctly nested.
- Attribute values are quoted.
- Special characters are escaped.
A well-formed document can be parsed, but it may still contain incorrect or unexpected structure.
Level 2: Valid XML
A valid XML document is both well-formed AND conforms to the rules defined in a schema (DTD or XSD). Validation checks:
- Whether the correct elements are present.
- Whether elements appear in the required order and frequency.
- Whether attribute values are of the correct type.
- Whether required elements and attributes are not missing.
Validation Against a DTD
DTD validation checks the document against rules defined in a Document Type Definition. The parser verifies element names, nesting, and attribute types according to the DTD.
Example: Valid XML Against a DTD
DTD file (order.dtd):
<!ELEMENT order (item+)>
<!ELEMENT item (name, quantity, price)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ATTLIST order id ID #REQUIRED>
Valid XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE order SYSTEM "order.dtd">
<order id="O001">
<item>
<name>Keyboard</name>
<quantity>2</quantity>
<price>39.99</price>
</item>
</order>
Example: Invalid XML Against the Same DTD
<order id="O002">
<product>
<name>Mouse</name>
</product>
</order>
This is invalid because:
- It uses
<product>instead of the required<item>. - The required child elements
<quantity>and<price>are missing.
Validation Against an XSD Schema
XSD validation is more powerful and precise than DTD validation. It checks data types, value ranges, and complex structural rules.
Example: XSD Schema
File: contact.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="fullName" type="xs:string" />
<xs:element name="email" type="xs:string" />
<xs:element name="age" type="xs:integer" />
<xs:element name="phone" type="xs:string" minOccurs="0" />
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required" />
</xs:complexType>
</xs:element>
</xs:schema>
Valid XML:
<?xml version="1.0" encoding="UTF-8"?>
<contact id="C001"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="contact.xsd">
<fullName>Diana Prince</fullName>
<email>diana@example.com</email>
<age>34</age>
</contact>
Invalid XML — Wrong Data Type:
<contact id="C002">
<fullName>Bruce Wayne</fullName>
<email>bruce@example.com</email>
<age>thirty-two</age>
</contact>
This is invalid because thirty-two is a string, not an integer. XSD's data type checking catches this error.
Validating XML with Python
Python's lxml library supports both DTD and XSD validation.
XSD Validation with lxml
from lxml import etree
# Load the XSD schema
with open("contact.xsd", "rb") as f:
schema_doc = etree.parse(f)
schema = etree.XMLSchema(schema_doc)
# Load and validate the XML document
with open("contact.xml", "rb") as f:
xml_doc = etree.parse(f)
if schema.validate(xml_doc):
print("XML is valid!")
else:
print("Validation errors:")
for error in schema.error_log:
print(f" Line {error.line}: {error.message}")
DTD Validation with lxml
from lxml import etree
with open("order.dtd", "rb") as f:
dtd = etree.DTD(f)
with open("order.xml", "rb") as f:
xml_doc = etree.parse(f)
if dtd.validate(xml_doc):
print("XML is valid!")
else:
for error in dtd.error_log:
print(f"Line {error.line}: {error.message}")
Online XML Validation Tools
Several free online tools can validate XML against a DTD or XSD without writing code:
- XMLValidation.com — Paste XML and DTD/XSD to validate.
- FreeFormatter.com — Validates and formats XML.
- W3C XML Validator — The official W3C tool.
- XML editors like Oxygen XML and Visual Studio Code (with XML extensions) validate as you type.
Common Validation Errors and What They Mean
| Error Message | Cause | Fix |
|---|---|---|
| Element 'X' is not expected | Element name doesn't match schema | Check element name spelling and case |
| The value 'abc' is not valid for type integer | Wrong data type in XSD | Provide a valid integer value |
| Missing required attribute 'id' | A required attribute was omitted | Add the required attribute |
| Element 'X' must appear at least once | A required child element is missing | Add the missing element |
| Element 'X' appeared too many times | Exceeds maxOccurs limit | Remove extra elements or increase maxOccurs |
Key Points
- Well-formed XML follows syntax rules; valid XML also conforms to a schema.
- Validation ensures the correct structure, element names, attribute types, and data values.
- DTD validation is simpler but lacks data type support.
- XSD validation is more powerful, with full data type checking and complex constraints.
- Python's
lxmllibrary provides robust validation support for both DTD and XSD. - Online tools and IDE extensions make XML validation accessible without writing code.
- Reading validation error messages carefully helps pinpoint and fix structural issues quickly.
