XML XPath

XPath is a query language used to navigate through and select parts of an XML document. Just as SQL is used to query databases, XPath is used to query XML trees. It provides a concise and powerful syntax to locate elements, attributes, and text based on their position, name, or value.

XPath is an essential tool for working with XML, and it forms the foundation of both XSLT (transformation) and XQuery (data querying).

The XML Document Used in This Topic

All examples in this topic use the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="fiction">
    <title lang="en">The Silent Forest</title>
    <author>Elena Marsh</author>
    <year>2020</year>
    <price>14.99</price>
  </book>
  <book category="science">
    <title lang="en">Cosmos Explained</title>
    <author>Ray Obi</author>
    <year>2018</year>
    <price>22.50</price>
  </book>
  <book category="fiction">
    <title lang="fr">Le Jardin Secret</title>
    <author>Claire Dupont</author>
    <year>2022</year>
    <price>11.00</price>
  </book>
</bookstore>

XPath Expressions: Selecting Nodes

XPath expressions describe a path through the XML tree to reach a specific node or set of nodes. The forward slash / separates levels of the tree, similar to a file path on a computer.

XPath ExpressionSelects
/bookstoreThe root element <bookstore>
/bookstore/bookAll <book> elements directly under <bookstore>
//bookAll <book> elements anywhere in the document
//titleAll <title> elements anywhere in the document
/bookstore/book/titleAll <title> elements inside <book> inside <bookstore>
.The current node
..The parent of the current node
@categoryThe category attribute of the current node

Selecting Specific Nodes with Predicates

A predicate filters nodes based on a condition. Predicates are written inside square brackets [ ] after a node name.

Selecting by Position

/bookstore/book[1]

Selects the first <book> element. XPath uses 1-based indexing.

/bookstore/book[last()]

Selects the last <book> element.

/bookstore/book[position() < 3]

Selects the first two <book> elements.

Selecting by Attribute Value

/bookstore/book[@category='fiction']

Selects all <book> elements where the category attribute equals fiction.

//title[@lang='en']

Selects all <title> elements where the lang attribute is en.

Selecting by Child Element Value

/bookstore/book[price > 20]

Selects all <book> elements where the child <price> value is greater than 20.

/bookstore/book[year=2022]/title

Selects the <title> of the book published in 2022.

XPath Wildcards

Wildcards make it possible to select multiple nodes without knowing their exact names.

WildcardMeaningExample
*Matches any element node/bookstore/* — all children of bookstore
@*Matches any attribute//book/@* — all attributes of any book
node()Matches any type of node/bookstore/book/node() — any node inside book

XPath Axes

XPath axes define the direction of navigation relative to the current node. They are used in advanced expressions.

AxisDescriptionExample
child::Direct children of the current nodechild::book
parent::Parent of the current nodeparent::bookstore
ancestor::All ancestors (parent, grandparent, etc.)ancestor::bookstore
descendant::All descendants (children, grandchildren, etc.)descendant::title
following-sibling::All siblings after the current nodefollowing-sibling::book
preceding-sibling::All siblings before the current nodepreceding-sibling::book
attribute::Attributes of the current nodeattribute::category
self::The current node itselfself::book

XPath Functions

XPath includes built-in functions for working with strings, numbers, and node sets.

Commonly Used XPath Functions

FunctionPurposeExample
count()Counts nodescount(/bookstore/book) → 3
sum()Sums numeric valuessum(/bookstore/book/price) → 48.49
text()Selects the text node//author/text()
contains()Checks if a string contains a substring//title[contains(., 'Secret')]
starts-with()Checks if string starts with prefix//book[starts-with(@category, 'f')]
string-length()Returns the length of a stringstring-length(//author[1])
normalize-space()Removes extra whitespacenormalize-space(//title[1])
not()Negates a condition//book[not(@category='fiction')]

Using XPath in Python

import xml.etree.ElementTree as ET

xml_data = """
<bookstore>
  <book category="fiction">
    <title lang="en">The Silent Forest</title>
    <author>Elena Marsh</author>
    <price>14.99</price>
  </book>
  <book category="science">
    <title lang="en">Cosmos Explained</title>
    <author>Ray Obi</author>
    <price>22.50</price>
  </book>
</bookstore>
"""

root = ET.fromstring(xml_data)

# Find all fiction books
for book in root.findall("./book[@category='fiction']"):
    print(book.find("title").text)

# Output: The Silent Forest

Key Points

  • XPath is used to navigate and select nodes in an XML document.
  • / selects from the root; // selects anywhere in the document.
  • Predicates [ ] filter results by position, attribute value, or child element value.
  • Wildcards * and @* match any element or attribute.
  • XPath axes allow navigation in any direction through the tree (parent, ancestor, sibling, etc.).
  • Built-in functions like count(), contains(), and sum() enable powerful data operations.
  • XPath is the query language behind both XSLT and XQuery.

Leave a Comment

Your email address will not be published. Required fields are marked *