XML Validation

Writing XML that merely follows syntax rules (well-formed XML) is the first step. But in real-world applications, it is equally important that the XML conforms to a specific structure and follows business rules. This is what XML validation is about — verifying that an XML document matches a predefined structure and set of constraints.

Two Levels of Correctness in XML

Level 1: Well-Formed XML

A well-formed XML document correctly follows all XML syntax rules:

  • Has exactly one root element.
  • All tags are properly opened and closed.
  • Tags are correctly nested.
  • Attribute values are quoted.
  • Special characters are escaped.

A well-formed document can be parsed, but it may still contain incorrect or unexpected structure.

Level 2: Valid XML

A valid XML document is both well-formed AND conforms to the rules defined in a schema (DTD or XSD). Validation checks:

  • Whether the correct elements are present.
  • Whether elements appear in the required order and frequency.
  • Whether attribute values are of the correct type.
  • Whether required elements and attributes are not missing.

Validation Against a DTD

DTD validation checks the document against rules defined in a Document Type Definition. The parser verifies element names, nesting, and attribute types according to the DTD.

Example: Valid XML Against a DTD

DTD file (order.dtd):

<!ELEMENT order (item+)>
<!ELEMENT item (name, quantity, price)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ATTLIST order id ID #REQUIRED>

Valid XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE order SYSTEM "order.dtd">
<order id="O001">
  <item>
    <name>Keyboard</name>
    <quantity>2</quantity>
    <price>39.99</price>
  </item>
</order>

Example: Invalid XML Against the Same DTD

<order id="O002">
  <product>
    <name>Mouse</name>
  </product>
</order>

This is invalid because:

  • It uses <product> instead of the required <item>.
  • The required child elements <quantity> and <price> are missing.

Validation Against an XSD Schema

XSD validation is more powerful and precise than DTD validation. It checks data types, value ranges, and complex structural rules.

Example: XSD Schema

File: contact.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="contact">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="fullName" type="xs:string" />
        <xs:element name="email" type="xs:string" />
        <xs:element name="age" type="xs:integer" />
        <xs:element name="phone" type="xs:string" minOccurs="0" />
      </xs:sequence>
      <xs:attribute name="id" type="xs:ID" use="required" />
    </xs:complexType>
  </xs:element>

</xs:schema>

Valid XML:

<?xml version="1.0" encoding="UTF-8"?>
<contact id="C001"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="contact.xsd">
  <fullName>Diana Prince</fullName>
  <email>diana@example.com</email>
  <age>34</age>
</contact>

Invalid XML — Wrong Data Type:

<contact id="C002">
  <fullName>Bruce Wayne</fullName>
  <email>bruce@example.com</email>
  <age>thirty-two</age>
</contact>

This is invalid because thirty-two is a string, not an integer. XSD's data type checking catches this error.

Validating XML with Python

Python's lxml library supports both DTD and XSD validation.

XSD Validation with lxml

from lxml import etree

# Load the XSD schema
with open("contact.xsd", "rb") as f:
    schema_doc = etree.parse(f)
schema = etree.XMLSchema(schema_doc)

# Load and validate the XML document
with open("contact.xml", "rb") as f:
    xml_doc = etree.parse(f)

if schema.validate(xml_doc):
    print("XML is valid!")
else:
    print("Validation errors:")
    for error in schema.error_log:
        print(f"  Line {error.line}: {error.message}")

DTD Validation with lxml

from lxml import etree

with open("order.dtd", "rb") as f:
    dtd = etree.DTD(f)

with open("order.xml", "rb") as f:
    xml_doc = etree.parse(f)

if dtd.validate(xml_doc):
    print("XML is valid!")
else:
    for error in dtd.error_log:
        print(f"Line {error.line}: {error.message}")

Online XML Validation Tools

Several free online tools can validate XML against a DTD or XSD without writing code:

  • XMLValidation.com — Paste XML and DTD/XSD to validate.
  • FreeFormatter.com — Validates and formats XML.
  • W3C XML Validator — The official W3C tool.
  • XML editors like Oxygen XML and Visual Studio Code (with XML extensions) validate as you type.

Common Validation Errors and What They Mean

Error MessageCauseFix
Element 'X' is not expectedElement name doesn't match schemaCheck element name spelling and case
The value 'abc' is not valid for type integerWrong data type in XSDProvide a valid integer value
Missing required attribute 'id'A required attribute was omittedAdd the required attribute
Element 'X' must appear at least onceA required child element is missingAdd the missing element
Element 'X' appeared too many timesExceeds maxOccurs limitRemove extra elements or increase maxOccurs

Key Points

  • Well-formed XML follows syntax rules; valid XML also conforms to a schema.
  • Validation ensures the correct structure, element names, attribute types, and data values.
  • DTD validation is simpler but lacks data type support.
  • XSD validation is more powerful, with full data type checking and complex constraints.
  • Python's lxml library provides robust validation support for both DTD and XSD.
  • Online tools and IDE extensions make XML validation accessible without writing code.
  • Reading validation error messages carefully helps pinpoint and fix structural issues quickly.

Leave a Comment

Your email address will not be published. Required fields are marked *