XML DTD

A well-formed XML document follows basic syntax rules. But what if a project requires more control — ensuring that specific elements always appear, or that certain attributes are mandatory? This is where a Document Type Definition (DTD) comes in. A DTD defines the structure and legal building blocks of an XML document, acting as a blueprint that the document must follow.

When an XML document conforms to its DTD, it is called a valid XML document (as opposed to merely well-formed).

What Does a DTD Define?

A DTD specifies:

  • Which elements are allowed and what they can contain.
  • What attributes each element can have and whether they are required or optional.
  • The order and number of child elements allowed within a parent.
  • Any special entities (shortcuts for commonly used text or characters).

Types of DTD

A DTD can be placed in two locations:

1. Internal DTD

The DTD rules are written directly inside the XML document, between the XML declaration and the root element.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
  <!ELEMENT note (to, from, subject, body)>
  <!ELEMENT to (#PCDATA)>
  <!ELEMENT from (#PCDATA)>
  <!ELEMENT subject (#PCDATA)>
  <!ELEMENT body (#PCDATA)>
]>
<note>
  <to>Sarah</to>
  <from>James</from>
  <subject>Meeting Tomorrow</subject>
  <body>Please confirm your attendance.</body>
</note>

2. External DTD

The DTD rules are stored in a separate file with a .dtd extension and referenced from the XML document. This is the preferred approach when multiple XML files share the same structure.

File: note.dtd

<!ELEMENT note (to, from, subject, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT body (#PCDATA)>

XML File referencing the external DTD:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
  <to>Sarah</to>
  <from>James</from>
  <subject>Meeting Tomorrow</subject>
  <body>Please confirm your attendance.</body>
</note>

Declaring Elements in a DTD

Elements are declared using the <!ELEMENT> declaration. The general syntax is:

<!ELEMENT elementname (content)>

Content Types

Content TypeDTD SyntaxMeaning
Text only(#PCDATA)Element contains plain text
Specific child elements(child1, child2)Must appear in that exact order
Empty elementEMPTYElement has no content
Any contentANYElement can have any content
Choice(a | b)Either a or b

Occurrence Indicators

Symbols placed after a child element name control how many times it can appear:

SymbolMeaning
(none)Exactly once
?Zero or one time (optional)
*Zero or more times
+One or more times

Element Declaration Examples

<!-- Library must contain one or more book elements -->
<!ELEMENT library (book+)>

<!-- A book must have title and author, optionally a summary -->
<!ELEMENT book (title, author, summary?)>

<!-- title, author, and summary are text-only -->
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT summary (#PCDATA)>

Declaring Attributes in a DTD

Attributes for an element are declared using <!ATTLIST>. The syntax is:

<!ATTLIST elementname attributename type default>

Common Attribute Types

TypeDescription
CDATAAny text string
IDA unique identifier within the document
IDREFA reference to an ID of another element
(a|b|c)Enumerated list — value must be one from the list

Default Value Keywords

KeywordMeaning
#REQUIREDAttribute must always be provided
#IMPLIEDAttribute is optional with no default
#FIXED "value"Attribute always has this fixed value
"defaultvalue"Attribute defaults to this value if not specified

Attribute Declaration Examples

<!-- id is required, status has a default of "active", type is optional -->
<!ATTLIST employee
  id       ID       #REQUIRED
  status   CDATA    "active"
  type     (full-time | part-time | contract) #IMPLIED
>

Declaring Entities in a DTD

Entities in a DTD are like shortcuts — they define a name that stands for a piece of text. When the entity name is used in the XML document, the parser replaces it with the defined text.

<!ENTITY companyName "Bright Solutions Ltd.">

Using the entity in the XML document:

<footer>Copyright 2024 &companyName;</footer>

The parser replaces &companyName; with Bright Solutions Ltd. when processing the document.

Complete DTD and XML Example

File: library.dtd

<!ELEMENT library (book+)>
<!ELEMENT book (title, author, year, genre?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT genre (#PCDATA)>
<!ATTLIST book id ID #REQUIRED>

XML File: library.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "library.dtd">
<library>
  <book id="B001">
    <title>The Midnight Garden</title>
    <author>Laura Chen</author>
    <year>2019</year>
    <genre>Fiction</genre>
  </book>
  <book id="B002">
    <title>Data Structures Simplified</title>
    <author>Mark Osei</author>
    <year>2021</year>
  </book>
</library>

Limitations of DTD

  • DTD does not support data types — it cannot specify that a value must be a number or a date.
  • DTD uses its own syntax, not XML syntax.
  • DTD has limited support for namespaces.
  • DTD cannot define complex constraints on element values.

These limitations led to the development of XML Schema (XSD), which is a more powerful and flexible validation system covered in the next topic.

Key Points

  • A DTD defines the allowed structure of an XML document.
  • A document that follows its DTD is called a valid XML document.
  • DTDs can be internal (written inside the XML file) or external (stored in a separate .dtd file).
  • Elements are declared with <!ELEMENT> and attributes with <!ATTLIST>.
  • Occurrence indicators (?, *, +) control how many times a child element appears.
  • Entities (<!ENTITY>) define reusable text shortcuts.
  • DTD has limitations in data typing and namespace support, which XML Schema addresses.

Leave a Comment

Your email address will not be published. Required fields are marked *