XPath to get values using AgilityPack from HTML page

https://stackoverflow.com/questions/22616254

20-06-2023
|

Question

I need to get numeric values from a web page into two variables.

A snippet from the page is below

<b>Downloads (current version):</b> 123                  <br />
<b>Downloads (total):</b> 253</td>
<br />

The "Downloads (current version):" and "Downloads (total):" are unique strings in the page.

I need to get the "123" and "253" into variables

Edit: Thanks to har07 I ended up with

var downloadscurrentversion = htmlDoc.DocumentNode.SelectSingleNode(@"//b[.='Downloads (current version):']/following-sibling::text()[1]");
var downloadsallversions = htmlDoc.DocumentNode.SelectSingleNode(@"//b[.='Downloads (total):']/following-sibling::text()[1]");

Console.WriteLine("Total: " + downloadsallversions.InnerText.Trim());
Console.WriteLine("Current: " + downloadscurrentversion.InnerText.Trim());

Solution

Check this example :

var html = @"<div>
<b>Downloads (current version):</b> 123                  <br />
<b>Downloads (total):</b> 253</td>
<br />
</div>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var result = htmlDoc.DocumentNode.SelectNodes("/div/text()[normalize-space(.)]");
foreach (var r in result)
{
    Console.WriteLine(r.InnerText.Trim());
}

this part of XPath from above example :

/div/text()

means, select all text nodes those are direct child of <div> element. And the last part :

[normalize-space(.)]

filters out empty text nodes.

UPDATE :

Responding to your comment, you can try this way instead :

var result = 
        htmlDoc.DocumentNode
               .SelectNodes(@"/div/b[.='Downloads (current version):' 
                                        or 
                                     .='Downloads (total):']/following-sibling::text()[1]");

Above XPath selects text node that is directly after specific <b> elements.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow