Question

I'm trying to learn XPath query using command line tools in Linux (I'm taking Stanford's Class2Go course Introduction to Databases). Given an XML file called BookstoreQ.xml of a Bookstore that contains both Books and Magazines, I can run the following query at the command line:

$ java -cp Saxon-HE-9.4.0.6.jar net.sf.saxon.Query -s:"BookstoreQ.xml" \
       -qs:'<results>{/Bookstore/(Book|Magazine)/Title}</results>'

and it will return the following result:

<?xml version="1.0" encoding="UTF-8"?>
<results>
  <Title>A First Course in Database Systems</Title>
  <Title>Database Systems: The Complete Book</Title>
  <Title>Hector and Jeff's Database Hints</Title>
  <Title>Jennifer's Economical Database Hints</Title>
  <Title>National Geographic</Title>
  <Title>National Geographic</Title>
  <Title>Newsweek</Title>
  <Title>Hector and Jeff's Database Hints</Title>
</results>

I get the same results if I use xmllint at the command line, as follows:

$ xmllint --xpath '/Bookstore/Book/Title | /Bookstore/Magazine/Title'

However, if I try to use the same exact XPath query as in the Saxon example, then I get an error as follows:

$ xmllint --xpath '/Bookstore/(Book|Magazine)/Title' BookstoreQ.xml
XPath error: Invalid Expression
/Bookstore/(Book|Magazine)/Title
           ^
xmlXPathEval: evaluation failed
XPath evaluation failure

why?

UPDATE:

Thanks to Francis and Michael for helping me to understand the issue. A workaround for playing with XPath at the command line in linux is to use something similar to the script below.

#!/bin/bash
# This script to run xpath queries
java -cp Saxon-HE-9.4.0.6.jar net.sf.saxon.Query -qs:"<results>{$1}</results>" \ 
\!indent=yes
echo

It assumes that you've placed saxon somewhere in your Java classpath. Thus, the following query below will output the above results (correctly indented):

$ xpath.sh "doc('BookstoreQ.xml')/Bookstore/(Book|Magazine)/Title"
Was it helpful?

Solution

xmllib2 (used by xmllint) only knows XPath 1.0, which can't use the union operator within a path step (the (Book|Magazine) part).

In XPath 1.0 you must write (/Bookstore/Book/Title | /Bookstore/Magazine/Title) or /Bookstore/*[name()='Book' or name()='Magazine']/Title

The fundamental reason for this limitation is that XPath 1.0 has no notion of sequences, only nodesets. The sequence datatype was created for XPath 2 and XQuery. /Bookstore/(Book|Magazine)/Title passes a sequence along each path step: sequence with the document node, then the Bookstore child element, then the union of Book and Magazine child element sequences sorted in document order, then the Title element children of those. XPath 1.0's union operator can only unify two nodesets into another nodeset, so it must be in the "outermost" expression context, not before or after a path separator.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top