Is there an object in C# that allows for easy management of HTML DOM?
-
28-09-2019 - |
Question
If I have a string that contains the html from a page I just got returned from an HTTP Post, how can I turn that into something that will let me easily traverse the DOM?
I figured HtmlDocument object would make sense, but it has no constructor. Are there any types that allow for easy management of HTML DOM?
Thanks,
Matt
Solution
The HtmlDocument is an instance of a document that is already loaded by a WebBrowser control. Thus no ctor.
Html Agility Pack is by far the best library I have used to this purpose
An example from the codeplex wiki
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]"))
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
The example shows loading of a file but there are overloads that let you load a string or a stream.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow