XML XPath

XPath is a query language used to navigate through and select parts of an XML document. Just as SQL is used to query databases, XPath is used to query XML trees. It provides a concise and powerful syntax to locate elements, attributes, and text based on their position, name, or value.

XPath is an essential tool for working with XML, and it forms the foundation of both XSLT (transformation) and XQuery (data querying).

The XML Document Used in This Topic

All examples in this topic use the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="fiction">
    <title lang="en">The Silent Forest</title>
    <author>Elena Marsh</author>
    <year>2020</year>
    <price>14.99</price>
  </book>
  <book category="science">
    <title lang="en">Cosmos Explained</title>
    <author>Ray Obi</author>
    <year>2018</year>
    <price>22.50</price>
  </book>
  <book category="fiction">
    <title lang="fr">Le Jardin Secret</title>
    <author>Claire Dupont</author>
    <year>2022</year>
    <price>11.00</price>
  </book>
</bookstore>

XPath Expressions: Selecting Nodes

XPath expressions describe a path through the XML tree to reach a specific node or set of nodes. The forward slash / separates levels of the tree, similar to a file path on a computer.

XPath Expression	Selects
`/bookstore`	The root element `<bookstore>`
`/bookstore/book`	All `<book>` elements directly under `<bookstore>`
`//book`	All `<book>` elements anywhere in the document
`//title`	All `<title>` elements anywhere in the document
`/bookstore/book/title`	All `<title>` elements inside `<book>` inside `<bookstore>`
`.`	The current node
`..`	The parent of the current node
`@category`	The `category` attribute of the current node

Selecting Specific Nodes with Predicates

A predicate filters nodes based on a condition. Predicates are written inside square brackets [ ] after a node name.

Selecting by Position

/bookstore/book[1]

Selects the first <book> element. XPath uses 1-based indexing.

/bookstore/book[last()]

Selects the last <book> element.

/bookstore/book[position() < 3]

Selects the first two <book> elements.

Selecting by Attribute Value

/bookstore/book[@category='fiction']

Selects all <book> elements where the category attribute equals fiction.

//title[@lang='en']

Selects all <title> elements where the lang attribute is en.

Selecting by Child Element Value

/bookstore/book[price > 20]

Selects all <book> elements where the child <price> value is greater than 20.

/bookstore/book[year=2022]/title

Selects the <title> of the book published in 2022.

XPath Wildcards

Wildcards make it possible to select multiple nodes without knowing their exact names.

Wildcard	Meaning	Example
`*`	Matches any element node	`/bookstore/*` — all children of bookstore
`@*`	Matches any attribute	`//book/@*` — all attributes of any book
`node()`	Matches any type of node	`/bookstore/book/node()` — any node inside book

XPath Axes

XPath axes define the direction of navigation relative to the current node. They are used in advanced expressions.

Axis	Description	Example
`child::`	Direct children of the current node	`child::book`
`parent::`	Parent of the current node	`parent::bookstore`
`ancestor::`	All ancestors (parent, grandparent, etc.)	`ancestor::bookstore`
`descendant::`	All descendants (children, grandchildren, etc.)	`descendant::title`
`following-sibling::`	All siblings after the current node	`following-sibling::book`
`preceding-sibling::`	All siblings before the current node	`preceding-sibling::book`
`attribute::`	Attributes of the current node	`attribute::category`
`self::`	The current node itself	`self::book`

XPath Functions

XPath includes built-in functions for working with strings, numbers, and node sets.

Commonly Used XPath Functions

Function	Purpose	Example
`count()`	Counts nodes	`count(/bookstore/book)` → 3
`sum()`	Sums numeric values	`sum(/bookstore/book/price)` → 48.49
`text()`	Selects the text node	`//author/text()`
`contains()`	Checks if a string contains a substring	`//title[contains(., 'Secret')]`
`starts-with()`	Checks if string starts with prefix	`//book[starts-with(@category, 'f')]`
`string-length()`	Returns the length of a string	`string-length(//author[1])`
`normalize-space()`	Removes extra whitespace	`normalize-space(//title[1])`
`not()`	Negates a condition	`//book[not(@category='fiction')]`

Using XPath in Python

import xml.etree.ElementTree as ET

xml_data = """
<bookstore>
  <book category="fiction">
    <title lang="en">The Silent Forest</title>
    <author>Elena Marsh</author>
    <price>14.99</price>
  </book>
  <book category="science">
    <title lang="en">Cosmos Explained</title>
    <author>Ray Obi</author>
    <price>22.50</price>
  </book>
</bookstore>
"""

root = ET.fromstring(xml_data)

# Find all fiction books
for book in root.findall("./book[@category='fiction']"):
    print(book.find("title").text)

# Output: The Silent Forest

Key Points

XPath is used to navigate and select nodes in an XML document.
/ selects from the root; // selects anywhere in the document.
Predicates [ ] filter results by position, attribute value, or child element value.
Wildcards * and @* match any element or attribute.
XPath axes allow navigation in any direction through the tree (parent, ancestor, sibling, etc.).
Built-in functions like count(), contains(), and sum() enable powerful data operations.
XPath is the query language behind both XSLT and XQuery.

Previous lessons

Back to courses

Next lessons