HTML Agility Pack using XPATH questions

https://stackoverflow.com/questions/23172834

06-07-2023
|

题

I'm using this code to get all tables in my html document:

var tables = document.DocumentNode.SelectNodes("table[@class='something']");

Inside each table i have multiple rows and multiples columns. I have something like this so far:

HtmlNodeCollection rows = tables[0].SelectNodes(".//TR");
for (int i = 0; i < rows.Count; ++i)
{
    HtmlNodeCollection cols = rows[i].SelectNodes(".//TD");

    for (int j = 0; j < cols.Count; ++j)
    {
        string value = cols[j].InnerText;
    }
}

I need help to understand the use of XPATH, since i can't find online documentation. For example how i would get the content if my html document is like this:

<table class="something">
  <colgroup>...</colgroup>
  <thead>
    <tr>
      <td>...</td>
    </tr>
  </thead>
  <thead>...</thead>
  <tbody>
    <tr>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <td>...</td>
      <td>...</td>
    </tr>
  </tbody>
</table>

I only want what the content of "td"

解决方案

The XPath query to get td tags located inside a table with class "something" is

var nodes = document.DocumentNode.SelectNodes(@"//table[@class=""something""]//td");

This means:

// selects nodes in the document from the current node that match the selection no matter where they are
//table[@class="something"] selects table tags with the attribute class equals to 'something' anywhere in the document.
//table[@class="something"]//td selects td tags that have as a parent, a grand parent or a grand grant parent a table tag

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow