I am trying to select elements from a table in this layout:

<tbody>
<tr class="header">
      <th colspan="4">Tier 1</th>
 </tr>
 <tr>
          <td><a>First Thing</a></td>
          <td><a>Second Thing</a></td>
          <td><a>Third Thing</a></td>
          <td></td>
 </tr>
 <tr>
          <td><a>Fourth Thing</a></td>
          <td><a>Fifth Thing</a></td>
          <td><a>Sixth Thing</a></td>
          <td></td>
      </tr>


<tr class="header">
      <th colspan="4">Tier 2</th>
 </tr>
 <tr>
          <td><a>First Thing</a></td>
          <td><a>Second Thing</a></td>
          <td><a>Third Thing</a></td>
          <td></td>
 </tr>
 <tr>
          <td><a>Fourth Thing</a></td>
          <td><a>Fifth Thing</a></td>
          <td><a>Sixth Thing</a></td>
          <td></td>
      </tr>

I want to select all the values between the "tr class=header" tags. I will need to do this 5 times (there are 6 tiers on the real table, not listed here because it would be too long) and then finally I need to select from that final header to the bottom of the table.
I should specify, I am using Agility Pack in C# MVC, so xpaths seem like the way to go.
So far I have been able to isolate the headers using "//tr[@class='header']//th".
The main issue seems to be that the nodes I want are siblings of each other, and not children which would make the traversal easier.
The end game is I want to give all tier 1 elements a value of 1 in my data structure, all tier 2 elements value 2, etc. for later comparison.

有帮助吗?

解决方案

First - you will need extension method to split rows by tiers:

public static IEnumerable<IEnumerable<T>> SplitBy<T>(
    this IEnumerable<T> source, Func<T, bool> separator)
{
    List<T> batch = new List<T>();

    using (var iterator = source.GetEnumerator())
    {
        while (iterator.MoveNext())
        {
            if (separator(iterator.Current) && batch.Any())
            {
                yield return batch;
                batch = new List<T>();
            }

            batch.Add(iterator.Current);
        }
    }

    if (batch.Any())
        yield return batch;
}

Now first step will be querying tiers (each will contain several tr nodes):

HtmlDocument doc = new HtmlDocument();
doc.Load(path_to_html);

var tiers = doc.DocumentNode.SelectNodes("//tr")
               .SplitBy(tr => tr.HasAttributes &&  
                              tr.Attributes["class"].Value == "header");

Second step is extracting data from each tier

var result = from t in tiers
             let tier = t.First().SelectSingleNode("th").InnerText
             from a in t.Skip(1).SelectMany(tr => tr.SelectNodes("td/a"))
             select new {
                 Tier = tier,
                 Value = a.InnerText
             };

Result is

[
  { Tier: "Tier 1", Value: "First Thing" },
  { Tier: "Tier 1", Value: "Second Thing" },
  { Tier: "Tier 1", Value: "Third Thing" },
  { Tier: "Tier 1", Value: "Fourth Thing" },
  { Tier: "Tier 1", Value: "Fifth Thing" },
  { Tier: "Tier 1", Value: "Sixth Thing" },
  { Tier: "Tier 2", Value: "First Thing" },
  { Tier: "Tier 2", Value: "Second Thing" },
  { Tier: "Tier 2", Value: "Third Thing" },
  { Tier: "Tier 2", Value: "Fourth Thing" },
  { Tier: "Tier 2", Value: "Fifth Thing" },
  { Tier: "Tier 2", Value: "Sixth Thing" }
]
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top