Group by the parsed value HTML AgilityPack C#
-
21-12-2019 - |
Question
Group data in C#, I have parsed the html file and get all the data on it, now I want to group them as following:
Those lines which are selected are the parent and contain the following childs, the code that I'm working on is here:
var uricontent = File.ReadAllText("TestHtml/Bew.html");
var doc = new HtmlDocument(); // with HTML Agility pack
doc.LoadHtml(uricontent);
var rooms = doc.DocumentNode.SelectNodes("//table[@class='rates']").SelectMany(
detail =>
{
return doc.DocumentNode.SelectNodes("//td[@class='rate-description'] | //table[@class='rooms']//h2 | //table[@class='rooms']//td[@class='room-price room-price-total']").Select(
r => new
{
RoomType = r.InnerText.CleanInnerText(),
});
}).ToArray();
the RoomType contains the data which is parsed by HTML AgilityPack, how can I group them by the Name like Pay & Save , Best Available Room Only ...
HTML File is here : http://notepad.cc/share/g0zh0TcyaG
Thank you
Solution
Instead of doing union of 3 XPath queries, then trying to group them back by "Rate Description" (aka by element : <td class="rate-description">
), you can do it another way around.
You can base your LINQ selection by "Rate Description", then in projection part, get all room types and room rates under current "Rate Description" using relative XPath :
var rooms =
doc.DocumentNode
.SelectNodes("//table[@class='rates']//tr[@class='rate']")
.Select(r => new
{
RateType = r.SelectSingleNode("./td[@class='rate-description']")
.InnerText.CleanInnerText,
RoomTypes = r.SelectNodes("./following-sibling::tr[@class='rooms'][1]//table[@class='rooms']//h2")
.Select(s => new
{
RoomType = s.InnerText.CleanInnerText,
Rate = s.SelectSingleNode(".//parent::td/following-sibling::td[@class='room-price room-price-total'][1]")
.InnerText.CleanInnerText
}).ToArray()
}).ToArray();
Notice period at the beginning of some XPath queries above. That tells HtmlAgilityPack
that the query is relative to current HtmlNode
. The result is about like this :