Question

WebBrowser Control seems to re-arrange attributes within HTML tags when setting webBrowser1.DocumentText..

I'm wondering if there is some kind of render mode or Document Encoding that I am missing. My problem can be seen by simply adding a RichTextBoxControl (txt_htmlBody) and a WebBrowser control (webBrowser1) to a windows form.

Add webBrowser1 WebBrowser Control, and add an event handler to; webBrowser1_DocumentCompleted

I used this to add my mouse click event to the web browser control.

  private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        // Attach an event to handle mouse clicks on the web browser
        this.webBrowser1.Document.Body.MouseDown += new HtmlElementEventHandler(Body_MouseDown);
    }

In the mouse click event, we get which element was clicked on like so;

   private void Body_MouseDown(Object sender, HtmlElementEventArgs e)
    {
        // Get the clicked HTML element
        HtmlElement elem = webBrowser1.Document.GetElementFromPoint(e.ClientMousePosition);

        if (elem != null)
        {
            highLightElement(elem);

        }
    }

    private void highLightElement(HtmlElement elem)
    {

        int len = this.txt_htmlBody.TextLength;
        int index = 0;

        string textToSearch = this.txt_htmlBody.Text.ToLower(); // convert everything in the text box to lower so we know we dont have a case sensitive issues
        string textToFind = elem.OuterHtml.ToLower();
        int lastIndex = textToSearch.LastIndexOf(textToFind); 
        // We cant find the text, because webbrowser control has re-arranged attributes in the <img> tag
        // Whats rendered by web browser: "<img border=0 alt=\"\" src=\"images/promo-green2_01_04.jpg\" width=393 height=30>"
        // What was passed to web browser from textbox: <img src="images/PROMO-GREEN2_01_04.jpg" width="393" height="30" border="0" alt=""/>
        // As you can see, I will never be able to find my data in the source because the webBrowser has changed it

    }

Add txt_htmlBody RichTextBox to the form, and set a TextChanged of the RichTextBox event to set the WebBrowser1.DocumentText as the RichTextBox (txt_htmlBody) text changed.

   private void txt_htmlBody_TextChanged(object sender, EventArgs e)
    {
        try
        {

            webBrowser1.DocumentText = txt_htmlBody.Text.Replace("\n", String.Empty);

        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.Message);
        }
    }

When you run your program, copy the below example HTML into txt_htmlBody, and click the Image on the right and debug highLightElement. You will see by my coments why I can not find the specified text in my search string, because WebBrowser control re-arranges the attributes.

<img src="images/PROMO-GREEN2_01_04.jpg" width="393" height="30" border="0" alt=""/>

Does anyone know how to make WebBrowser control render my HTML as-is?

Thank you for your time.

Was it helpful?

Solution

You cannot expect the processed HTML to be 1:1 the same as the original source, when you obtain it back via element.OuterHtml. It's almost never the same, regardless of the rendering mode.

However, despite the attributes may have got rearranged, their names and values are still the same, so you'd just need to improve your search logic (e.g., by walking the DOM three or simply enumerating elements via HtmlDocument.All and checking their attributes via HtmlElement.GetAttribute).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top