按解析值分组HTML AgilityPack C#

https://stackoverflow.com//questions/23028312

21-12-2019
|

题

在C＃中分组数据，我已经解析了html文件并获取其上的所有数据，现在我想将它们分组为以下内容:

enter image description here

这些被选中的行是父行，并包含以下子行，我正在处理的代码在这里:

var uricontent = File.ReadAllText("TestHtml/Bew.html");
            var doc = new HtmlDocument(); // with HTML Agility pack
            doc.LoadHtml(uricontent);

            var rooms = doc.DocumentNode.SelectNodes("//table[@class='rates']").SelectMany(
                detail =>
                {

                    return doc.DocumentNode.SelectNodes("//td[@class='rate-description'] | //table[@class='rooms']//h2 | //table[@class='rooms']//td[@class='room-price room-price-total']").Select(
                        r => new
                        {
                            RoomType = r.InnerText.CleanInnerText(),
                        });
                }).ToArray();

RoomType包含由HTML AgilityPack解析的数据，如何按Pay＆Save，Best Available Room Only等名称对它们进行分组。..

HTML文件在这里 : http://notepad.cc/share/g0zh0TcyaG

谢谢！

解决方案

而不是做3个XPath查询的联合，然后尝试通过"速率描述"（又名按元素）将它们分组 : <td class="rate-description">），你可以用另一种方式来做。

您可以根据"费率描述"选择LINQ，然后在投影部分，使用相对XPath获取当前"费率描述"下的所有房间类型和房价 :

var rooms = 
    doc.DocumentNode
       .SelectNodes("//table[@class='rates']//tr[@class='rate']")
       .Select(r => new
         {
            RateType = r.SelectSingleNode("./td[@class='rate-description']")
                        .InnerText.CleanInnerText,
            RoomTypes = r.SelectNodes("./following-sibling::tr[@class='rooms'][1]//table[@class='rooms']//h2")
                         .Select(s => new
                         {
                            RoomType = s.InnerText.CleanInnerText,
                            Rate = s.SelectSingleNode(".//parent::td/following-sibling::td[@class='room-price room-price-total'][1]")
                                    .InnerText.CleanInnerText
                         }).ToArray()
         }).ToArray();

在上面的一些XPath查询开始的通知期。这说明了 HtmlAgilityPack 查询是相对于当前的 HtmlNode.结果大概是这样的 :

enter image description here

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow