Get nodes that don't have specific ancestor xml xpath

https://stackoverflow.com/questions/6012439

14-11-2019
|

Question

I'm struggling few days with quite complex xpath and I'm not able to formulate it. I have a syntactic tree from c++ like language parser and I would like to have xpath query, that selects all names, that are not in function name.

To be specific, I have xml document like this

(Whole xml document is on the end of the question, it is quite large I paste here a simple overview of the document structure) there are four node types
a - this element contains one node
b - contains information of the node (e.g. "CALL_EXPRESSION")
c - contains actual text (e.g. "printf", variable names...)
d - contains descendats of current node (a elements)

CALL_EXPRESSION
  DOT_EXPRESSION
    NAME_EXPRESSION
      NAME
    NAME_EXPRESSION
      NAME
  PARAMS
    NAME_EXPRESSION
      NAME

CALL_EXPRESSION
  NAME_EXPRESSION
    NAME
  PARAMS
    NAME_EXPRESSION
      NAME

ASSIGNMENT_EXPRESSION
  NAME_EXPRESSION
    NAME
  NAME_EXPRESSION
    NAME

I would like to formulate Xpath query, that would select all NAMEs that are not descendats of CALL_EXPRESSION/*[1]. (This means i would like to select all variables and not the function names).

To select all the function names I can use Xpath like this

//a[b="CALL_EXPRESSION"]/d/a[1]

no problem here. Now, if I would like to select all nodes that are not descendats of this nodes. I would use not(ancestor::X).

But here goes the problem, if I formulate the Xpath expression like this:

//*[b="NAME"][not(ancestor::a[b="CALL_EXPRESSION"]/d/a[1])]

it selects only nodes, that don't have a that has child b="CALL_EXPRESSION" at all. In our example, it selects only NAME from the ASSIGNMENT_EXPRESSION subtree.

I suspected, that the problem is, that ancestor:: takes only the first element (in our case a[b="CALL_EXPRESSION"]) and restricts according to its predicate and further / are discarded. So i modified the xpath query like this:

//*[b="NAME"][not(ancestor::a[../../b="CALL_EXPRESSION" and position()=1])]

This seems to work only on the simpler CALL_EXPRESSION (without the DOT_EXPRESSION). I suspected, that the path in [] might be relative only to current node, not to the potential ancestors. But when I used the query

//*[b="NAME"][not(ancestor::a[b="CALL_EXPRESSION"])]

it worked as one would assume (all NAMEs what don't have ancestor CALL_EXPRESSION were selected).

Is there any way to formulate the query I need? And why don't the queries work?

Thanks in advance :)

The XML

<a>
 <b>CALL_EXPRESSION</b>
 <c>object.method(a)</c>
 <d>
   <a>
     <b>DOT_EXPRESSION</b>
     <c>object.method</c>
     <d>
       <a>
         <b>NAME_EXPRESSION</b>
         <c>object</c>
         <d>
           <a>
             <b>NAME</b>
             <c>object</c>
             <d>
             </d>
           </a>
         </d>
       </a>
       <a>
         <b>NAME_EXPRESSION</b>
         <c>method</c>
         <d>
           <a>
             <b>NAME</b>
             <c>method</c>
             <d>
             </d>
           </a>
         </d>
       </a>
     </d>
   </a>
   <a>
     <b>PARAMS</b>
     <c>(a)</c>
     <d>
       <a>
         <b>NAME_EXPRESSION</b>
         <c>a</c>
         <d>
           <a>
             <b>NAME</b>
             <c>a</c>
             <d>
             </d>
           </a>
         </d>
       </a>
     </d>
   </a>
 </d>
</a>

<a>
 <b>CALL_EXPRESSION</b>
 <c>puts(b)</c>
 <d>
   <a>
     <b>NAME_EXPRESSION</b>
     <c>puts</c>
     <d>
       <a>
         <b>NAME</b>
         <c>puts</c>
         <d>
         </d>
       </a>
     </d>
   </a>
   <a>
     <b>PARAMS</b>
     <c>(b)</c>
     <d>
       <a>
         <b>NAME_EXPRESSION</b>
         <c>b</c>
         <d>
           <a>
             <b>NAME</b>
             <c>b</c>
             <d>
             </d>
           </a>
         </d>
       </a>
     </d>
   </a>
 </d>
</a>

<a>
 <b>ASSIGNMENT_EXPRESSION</b>
 <c>c=d;</c>
 <d>
   <a>
     <b>NAME_EXPRESSION</b>
     <c>c</c>
     <d>
       <a>
         <b>NAME</b>
         <c>c</c>
         <d>
         </d>
       </a>
     </d>
   </a>
   <a>
     <b>NAME_EXPRESSION</b>
     <c>d</c>
     <d>
       <a>
         <b>NAME</b>
         <c>d</c>
         <d>
         </d>
       </a>
     </d>
   </a>
 </d>
</a>

Solution

You didn't say whether this is XPath 1.0 or 2.0. In XPath 2.0 you can use the except operator: for example

//* except //x//*

to select all elements that don't have x as an ancestor.

The except operator can also be simulated in XPath 1.0 using the equivalence

E1 except E2 ==> E1[count(.|E2)!=count(E2)]

(but taking care over the context for evaluation of E2).

OTHER TIPS

The question is not very clear and the XML provided isn't a wellformed XML document.

Anyway, here is my attempt to answer based on my understanding of this question text.

Let's have the following simple XML document:

<t>
 <x>
   <y>
     <z>Text 1</z>
   </y>
 </x>
 <x>
  <y>
    <z> Text 2</z>
  </y>
 </x>
</t>

We want to select all z elements that are not descendents of /t/x[1]

Use either this XPath expression:

/t/z | /t/x[position() > 1]//z

or this one:

//z[not(ancestor::x
             [count(ancestor::*) = 1
            and
              not(preceding-sibling::x)
             ]
        )
    ]

I'd certainly recommend the first XPath expression as it is obviously much simpler, shorter and easier to understand.

It means: Select all z children of the top element t of the XML document and all z descendents of any x child of the top element t that is not the first such x child (whose position among all x children of t is not 1).

The second expression means: Select all z elements in the XML document that don't have as ancestor an element x that has only one element-ancestor (is a child of the top element) and has no preceding siblings named x (in other words that is the first x child of its parent).

Finally, here is a quick verification of the correctness of the two XPath expressions:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//z[not(ancestor::x
             [count(ancestor::*) = 1
            and
              not(preceding-sibling::x)
             ]
          )
      ]
  "/>

-------------------

 <xsl:copy-of select="/t/z | /t/x[position() > 1]//z"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the simple XML document (shown above), we see that both expressions select exactly the wanted z element. The result of the transformation is:

<z> Text 2</z>

-------------------

 <z> Text 2</z>

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow