Question

Group data in C#, I have parsed the html file and get all the data on it, now I want to group them as following:

enter image description here

Those lines which are selected are the parent and contain the following childs, the code that I'm working on is here:

var uricontent = File.ReadAllText("TestHtml/Bew.html");
            var doc = new HtmlDocument(); // with HTML Agility pack
            doc.LoadHtml(uricontent);

            var rooms = doc.DocumentNode.SelectNodes("//table[@class='rates']").SelectMany(
                detail =>
                {

                    return doc.DocumentNode.SelectNodes("//td[@class='rate-description'] | //table[@class='rooms']//h2 | //table[@class='rooms']//td[@class='room-price room-price-total']").Select(
                        r => new
                        {
                            RoomType = r.InnerText.CleanInnerText(),
                        });
                }).ToArray();

the RoomType contains the data which is parsed by HTML AgilityPack, how can I group them by the Name like Pay & Save , Best Available Room Only ...

HTML File is here : http://notepad.cc/share/g0zh0TcyaG

Thank you

Was it helpful?

Solution

Instead of doing union of 3 XPath queries, then trying to group them back by "Rate Description" (aka by element : <td class="rate-description">), you can do it another way around.

You can base your LINQ selection by "Rate Description", then in projection part, get all room types and room rates under current "Rate Description" using relative XPath :

var rooms = 
    doc.DocumentNode
       .SelectNodes("//table[@class='rates']//tr[@class='rate']")
       .Select(r => new
         {
            RateType = r.SelectSingleNode("./td[@class='rate-description']")
                        .InnerText.CleanInnerText,
            RoomTypes = r.SelectNodes("./following-sibling::tr[@class='rooms'][1]//table[@class='rooms']//h2")
                         .Select(s => new
                         {
                            RoomType = s.InnerText.CleanInnerText,
                            Rate = s.SelectSingleNode(".//parent::td/following-sibling::td[@class='room-price room-price-total'][1]")
                                    .InnerText.CleanInnerText
                         }).ToArray()
         }).ToArray();

Notice period at the beginning of some XPath queries above. That tells HtmlAgilityPack that the query is relative to current HtmlNode. The result is about like this :

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top