Building HtmlElement object trees

https://stackoverflow.com/questions/631955

08-07-2019
|

Question

I'm using the MSIE WebBrowser control in a C# desktop application and am looking for a way to build and maintain trees of HtmlElement objects outside of this control. I am trying to quickly switch between multiple complex pages without incurring the overhead of re-parsing the HTML each time (and I don't want to maintain multiple controls that are shown/hidden as needed). I discovered that a) I can only create HtmlElement objects via the control's HtmlDocument and b) once I remove a "trunk" of HtmlElement objects from the control's HtmlDocument, it "dies off," even though I keep maintaining a strong reference to the root element. How can I do this?

P.S. I am willing to consider alternative browser controls (e.g. Gecko) if they allow me to accomplish the above.

Solution

This will do it

// On screen webbrowser control
webBrowserControl.Navigate("about:blank");
webBrowserControl.Document.Write("<div id=\"div1\">This will change</div>");
var elementToReplace = webBrowserControl.Document.GetElementById("div1");
var nodeToReplace = elementToReplace.DomElement as mshtml.IHTMLDOMNode;

// In memory webbrowser control to load fragement into
// It needs this base object as it is a COM control
var webBrowserFragement = new WebBrowser();
webBrowserFragement.Navigate("about:blank");
webBrowserFragement.Document.Write("<div id=\"div1\">Hello World!</div>");
var elementReplacement = webBrowserFragement.Document.GetElementById("div1");
var nodeReplacement = elementReplacement.DomElement as mshtml.IHTMLDOMNode;

// The magic happens here!
nodeToReplace.replaceNode(nodeReplacement);

I doubt this will improve performce as the text renderer is fast, and the memory consumed will still be the same if you have one large page with hidden div's or have multiple div's in memory in other objects?

OTHER TIPS

You can use the MSHTML library (mshtml.dll) to achieve this. Basically you would use a single about:blank page and then dynamically write and remove content from it.

See this blog post on this subject

You can also write a custom interface wrapper that exposes the functionality you need from mshtml rather than referencing the whole thing (Nearly 8MB) and it is really easy to do using f12 in VS.

Do you really need to remove them enturely? How about leaving your "branch" in the DOM as the child of a DIV whose style="display:none". That way they're real, live DOM objects but not visible.

I think you could also use the htmlagilitypack It allows you to parse once, querying the HTML tree using XPath or via iterators and re-writing the tree with a save method when done. Depending on your structure, you might just create an adapter around the classes, because it only works on an entire html document and you want it on elements only, but this should be not too hard.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow