Question

I want to access an element using text() attribut of xpath having a structure like shown below.

<root>
    <child>
        <lowerchild>
            <lowestchild>
                My text
            </lowestchild>
        </lowerchild>
    </child>
</root>

.

//child[contains(text(), 'My text')]

should return the child-element. and

//lowerchild[contains(text(), 'My text')] 

should return the lowerchildelement.

I tried out the XPath-commands with HTMLAgilityPack, but they were not able to find those elements.

The final result of my little project is a small xpath-searcher, so the user gives the name of element the attribut and the value, so it would be great if you might give me a solution only using that information. It could be any random structure. if element names double themselves like if we had 2 lowestchild-elements, than i would like to pick the "lower" one of the lowest. Hope you can help me.

Was it helpful?

Solution

Instead of

//child[contains(text(), 'My text')]

it looks like you want

//child[contains(., 'My text')]

The XPath expression text() (with the implicit child:: axis) selects any text node that is a child of the context node. In the above example, it selects only text nodes that are immediate children of the child element. In the XML you showed, the child element has two child text nodes, with the lowerchild element in between them. Both text nodes contain only whitespace, and for this reason they may be stripped by some processors, depending on settings.

If you pass a node-set or a sequence as the first parameter to contains(a, b), it takes the first node and converts it to a string. So your parameter is getting converted to a string containing only whitespace, or else an empty string (if the whitespace-only text nodes got stripped).

But if instead of text() you pass . as the first argument to contains(), then the context node (which is a child) gets converted to a string. This means concatenating the values of all text node descendants of child, not just immediate text node children. (It's sort of like DOM innerText, which your question title mentions, but does not include start/end tags of elements, nor attributes.) For this reason, //child[contains(., 'My text')] will return the child element.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top