XML XPath
XPath is a query language used to navigate through and select parts of an XML document. Just as SQL is used to query databases, XPath is used to query XML trees. It provides a concise and powerful syntax to locate elements, attributes, and text based on their position, name, or value.
XPath is an essential tool for working with XML, and it forms the foundation of both XSLT (transformation) and XQuery (data querying).
The XML Document Used in This Topic
All examples in this topic use the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="fiction">
<title lang="en">The Silent Forest</title>
<author>Elena Marsh</author>
<year>2020</year>
<price>14.99</price>
</book>
<book category="science">
<title lang="en">Cosmos Explained</title>
<author>Ray Obi</author>
<year>2018</year>
<price>22.50</price>
</book>
<book category="fiction">
<title lang="fr">Le Jardin Secret</title>
<author>Claire Dupont</author>
<year>2022</year>
<price>11.00</price>
</book>
</bookstore>
XPath Expressions: Selecting Nodes
XPath expressions describe a path through the XML tree to reach a specific node or set of nodes. The forward slash / separates levels of the tree, similar to a file path on a computer.
| XPath Expression | Selects |
|---|---|
/bookstore | The root element <bookstore> |
/bookstore/book | All <book> elements directly under <bookstore> |
//book | All <book> elements anywhere in the document |
//title | All <title> elements anywhere in the document |
/bookstore/book/title | All <title> elements inside <book> inside <bookstore> |
. | The current node |
.. | The parent of the current node |
@category | The category attribute of the current node |
Selecting Specific Nodes with Predicates
A predicate filters nodes based on a condition. Predicates are written inside square brackets [ ] after a node name.
Selecting by Position
/bookstore/book[1]
Selects the first <book> element. XPath uses 1-based indexing.
/bookstore/book[last()]
Selects the last <book> element.
/bookstore/book[position() < 3]
Selects the first two <book> elements.
Selecting by Attribute Value
/bookstore/book[@category='fiction']
Selects all <book> elements where the category attribute equals fiction.
//title[@lang='en']
Selects all <title> elements where the lang attribute is en.
Selecting by Child Element Value
/bookstore/book[price > 20]
Selects all <book> elements where the child <price> value is greater than 20.
/bookstore/book[year=2022]/title
Selects the <title> of the book published in 2022.
XPath Wildcards
Wildcards make it possible to select multiple nodes without knowing their exact names.
| Wildcard | Meaning | Example |
|---|---|---|
* | Matches any element node | /bookstore/* — all children of bookstore |
@* | Matches any attribute | //book/@* — all attributes of any book |
node() | Matches any type of node | /bookstore/book/node() — any node inside book |
XPath Axes
XPath axes define the direction of navigation relative to the current node. They are used in advanced expressions.
| Axis | Description | Example |
|---|---|---|
child:: | Direct children of the current node | child::book |
parent:: | Parent of the current node | parent::bookstore |
ancestor:: | All ancestors (parent, grandparent, etc.) | ancestor::bookstore |
descendant:: | All descendants (children, grandchildren, etc.) | descendant::title |
following-sibling:: | All siblings after the current node | following-sibling::book |
preceding-sibling:: | All siblings before the current node | preceding-sibling::book |
attribute:: | Attributes of the current node | attribute::category |
self:: | The current node itself | self::book |
XPath Functions
XPath includes built-in functions for working with strings, numbers, and node sets.
Commonly Used XPath Functions
| Function | Purpose | Example |
|---|---|---|
count() | Counts nodes | count(/bookstore/book) → 3 |
sum() | Sums numeric values | sum(/bookstore/book/price) → 48.49 |
text() | Selects the text node | //author/text() |
contains() | Checks if a string contains a substring | //title[contains(., 'Secret')] |
starts-with() | Checks if string starts with prefix | //book[starts-with(@category, 'f')] |
string-length() | Returns the length of a string | string-length(//author[1]) |
normalize-space() | Removes extra whitespace | normalize-space(//title[1]) |
not() | Negates a condition | //book[not(@category='fiction')] |
Using XPath in Python
import xml.etree.ElementTree as ET
xml_data = """
<bookstore>
<book category="fiction">
<title lang="en">The Silent Forest</title>
<author>Elena Marsh</author>
<price>14.99</price>
</book>
<book category="science">
<title lang="en">Cosmos Explained</title>
<author>Ray Obi</author>
<price>22.50</price>
</book>
</bookstore>
"""
root = ET.fromstring(xml_data)
# Find all fiction books
for book in root.findall("./book[@category='fiction']"):
print(book.find("title").text)
# Output: The Silent Forest
Key Points
- XPath is used to navigate and select nodes in an XML document.
/selects from the root;//selects anywhere in the document.- Predicates
[ ]filter results by position, attribute value, or child element value. - Wildcards
*and@*match any element or attribute. - XPath axes allow navigation in any direction through the tree (parent, ancestor, sibling, etc.).
- Built-in functions like
count(),contains(), andsum()enable powerful data operations. - XPath is the query language behind both XSLT and XQuery.
