XML DTD
A well-formed XML document follows basic syntax rules. But what if a project requires more control — ensuring that specific elements always appear, or that certain attributes are mandatory? This is where a Document Type Definition (DTD) comes in. A DTD defines the structure and legal building blocks of an XML document, acting as a blueprint that the document must follow.
When an XML document conforms to its DTD, it is called a valid XML document (as opposed to merely well-formed).
What Does a DTD Define?
A DTD specifies:
- Which elements are allowed and what they can contain.
- What attributes each element can have and whether they are required or optional.
- The order and number of child elements allowed within a parent.
- Any special entities (shortcuts for commonly used text or characters).
Types of DTD
A DTD can be placed in two locations:
1. Internal DTD
The DTD rules are written directly inside the XML document, between the XML declaration and the root element.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ELEMENT note (to, from, subject, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Sarah</to>
<from>James</from>
<subject>Meeting Tomorrow</subject>
<body>Please confirm your attendance.</body>
</note>
2. External DTD
The DTD rules are stored in a separate file with a .dtd extension and referenced from the XML document. This is the preferred approach when multiple XML files share the same structure.
File: note.dtd
<!ELEMENT note (to, from, subject, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT body (#PCDATA)>
XML File referencing the external DTD:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Sarah</to>
<from>James</from>
<subject>Meeting Tomorrow</subject>
<body>Please confirm your attendance.</body>
</note>
Declaring Elements in a DTD
Elements are declared using the <!ELEMENT> declaration. The general syntax is:
<!ELEMENT elementname (content)>
Content Types
| Content Type | DTD Syntax | Meaning |
|---|---|---|
| Text only | (#PCDATA) | Element contains plain text |
| Specific child elements | (child1, child2) | Must appear in that exact order |
| Empty element | EMPTY | Element has no content |
| Any content | ANY | Element can have any content |
| Choice | (a | b) | Either a or b |
Occurrence Indicators
Symbols placed after a child element name control how many times it can appear:
| Symbol | Meaning |
|---|---|
| (none) | Exactly once |
? | Zero or one time (optional) |
* | Zero or more times |
+ | One or more times |
Element Declaration Examples
<!-- Library must contain one or more book elements -->
<!ELEMENT library (book+)>
<!-- A book must have title and author, optionally a summary -->
<!ELEMENT book (title, author, summary?)>
<!-- title, author, and summary are text-only -->
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT summary (#PCDATA)>
Declaring Attributes in a DTD
Attributes for an element are declared using <!ATTLIST>. The syntax is:
<!ATTLIST elementname attributename type default>
Common Attribute Types
| Type | Description |
|---|---|
CDATA | Any text string |
ID | A unique identifier within the document |
IDREF | A reference to an ID of another element |
(a|b|c) | Enumerated list — value must be one from the list |
Default Value Keywords
| Keyword | Meaning |
|---|---|
#REQUIRED | Attribute must always be provided |
#IMPLIED | Attribute is optional with no default |
#FIXED "value" | Attribute always has this fixed value |
"defaultvalue" | Attribute defaults to this value if not specified |
Attribute Declaration Examples
<!-- id is required, status has a default of "active", type is optional -->
<!ATTLIST employee
id ID #REQUIRED
status CDATA "active"
type (full-time | part-time | contract) #IMPLIED
>
Declaring Entities in a DTD
Entities in a DTD are like shortcuts — they define a name that stands for a piece of text. When the entity name is used in the XML document, the parser replaces it with the defined text.
<!ENTITY companyName "Bright Solutions Ltd.">
Using the entity in the XML document:
<footer>Copyright 2024 &companyName;</footer>
The parser replaces &companyName; with Bright Solutions Ltd. when processing the document.
Complete DTD and XML Example
File: library.dtd
<!ELEMENT library (book+)>
<!ELEMENT book (title, author, year, genre?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT genre (#PCDATA)>
<!ATTLIST book id ID #REQUIRED>
XML File: library.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "library.dtd">
<library>
<book id="B001">
<title>The Midnight Garden</title>
<author>Laura Chen</author>
<year>2019</year>
<genre>Fiction</genre>
</book>
<book id="B002">
<title>Data Structures Simplified</title>
<author>Mark Osei</author>
<year>2021</year>
</book>
</library>
Limitations of DTD
- DTD does not support data types — it cannot specify that a value must be a number or a date.
- DTD uses its own syntax, not XML syntax.
- DTD has limited support for namespaces.
- DTD cannot define complex constraints on element values.
These limitations led to the development of XML Schema (XSD), which is a more powerful and flexible validation system covered in the next topic.
Key Points
- A DTD defines the allowed structure of an XML document.
- A document that follows its DTD is called a valid XML document.
- DTDs can be internal (written inside the XML file) or external (stored in a separate
.dtdfile). - Elements are declared with
<!ELEMENT>and attributes with<!ATTLIST>. - Occurrence indicators (
?,*,+) control how many times a child element appears. - Entities (
<!ENTITY>) define reusable text shortcuts. - DTD has limitations in data typing and namespace support, which XML Schema addresses.
