Is there an object in C# that allows for easy management of HTML DOM?

https://stackoverflow.com/questions/2672480

28-09-2019
|

Question

If I have a string that contains the html from a page I just got returned from an HTTP Post, how can I turn that into something that will let me easily traverse the DOM?

I figured HtmlDocument object would make sense, but it has no constructor. Are there any types that allow for easy management of HTML DOM?

Thanks,
Matt

Solution

The HtmlDocument is an instance of a document that is already loaded by a WebBrowser control. Thus no ctor.

Html Agility Pack is by far the best library I have used to this purpose

An example from the codeplex wiki

HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]"))
{
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
}
doc.Save("file.htm");

The example shows loading of a file but there are overloads that let you load a string or a stream.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow