Retrieve all text nodes of element including children using HtmlAgilityPack in C#

https://stackoverflow.com/questions/19349826

30-06-2022
|

Question

I am trying to get all the text nodes of an element including its children, but for some reason it is giving me the entire documents HTML.

This is what I've came up with:

HtmlAgilityPack.HtmlNode el = htmlDoc.DocumentNode.SelectSingleNode("(//div[@class='TableContainer'])[" + index + "]");
if (el != null)
{
    foreach (HtmlNode node in el.SelectNodes("//text()"))
    {
        Debug.WriteLine("text=" + node.InnerText.Replace("&#160;", " "));
    }
}

It will print text=line of the whole document. I'm sure there's something wrong with the //text(), which is a snippet I found here at SO, but I don't know another way of doing it and I've been going crazy with it.

Solution

You should use a relative XPath expression, that is, relative to your el context node

HtmlAgilityPack.HtmlNode el = htmlDoc.DocumentNode.SelectSingleNode("(//div[@class='TableContainer'])[" + index + "]");
if (el != null)
{
    foreach (HtmlNode node in el.SelectNodes(".//text()"))
    {
        Debug.WriteLine("text=" + node.InnerText.Replace("&#160;", " "));
    }
}

"//text()" will select all descendant text nodes of the document root node

See Location Paths and Abbreviated Syntax from XPath specifications for details.

//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

.//para selects the para element descendants of the context node

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow