Question

I hope the title was clear enough, but I will try to explain...

I'm using C# Winforms ( dotnet 4.5 ).

The thing is that I'm creating a WebBrowser control and try to set the content with wb.DocumentText. But when I try to loop through the elements, it says that the document is empty (null)

Here's my code:

WebBrowser wb = new WebBrowser();
wb.DocumentText = leMessage;

HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
    // Do Some Stuff
}

leMessage holds an HTML newsletter message and there are some a tags in it.

I've already tried this: wb.Document.Body.InnerHtml = leMessage; but that didn't work either...

What did I miss or do wrong?

Was it helpful?

Solution

WebBrowser.DocumentText is asynchronous. You need to handle DocumentComplete before you can access the DOM, and keep pumping Windows messages. Here's a complete example of web-scraping, using async/await to keep the convinient linear code flow. Just alter the navigation part:

await NavigateAsync(ct, () => this.webBrowser.DocumentText = leMessage), timeout);
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");

This way you could do it in a loop. In a nutshell:

using System;
using System.Diagnostics;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace WinformsApp2
{
    public partial class MainForm : Form
    {
        public MainForm()
        {
            InitializeComponent();
        }

        const string leMessage = "<a href='http://example.com'>Go there</a>";

        private async void MainForm_Load(object sender, EventArgs e)
        {
            var wb = new WebBrowser();

            TaskCompletionSource<bool> tcs = null;
            WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (sender2, e2) => tcs.TrySetResult(true);

            for (int i = 0; i < 3; i++)
            {
                tcs = new TaskCompletionSource<bool>();
                wb.DocumentCompleted += documentCompletedHandler;
                try {
                    wb.DocumentText = leMessage;
                    await tcs.Task;
                }
                finally {
                    wb.DocumentCompleted -= documentCompletedHandler;
                }
                HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
                foreach (HtmlElement elem in elems)
                {
                    Debug.Print(elem.OuterHtml);
                }
            }
        }
    }
}

OTHER TIPS

You need to loop elements after event webBrowser1_DocumentCompleted is triggered .Therefore you need to have it in your code

webBrowser1.DocumentCompleted+=new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);

private void webBrowser1_DocumentCompleted(object sender,WebBrowserDocumentCompletedEventArgs e)
{
   //here you can to loop your elements     
}

Try this:

WebBrowser wb;
private void Form1_Load(object sender, EventArgs e)
{
    wb = new WebBrowser();
    wb.DocumentCompleted += wb_DocumentCompleted;
    wb.DocumentText = "<html><body><a href='#'>Test</a></body></html>";
}

void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    HtmlElementCollection elems = ((WebBrowser)sender)
        .Document.GetElementsByTagName("a");
    foreach (HtmlElement elem in elems)
    {
        // Do Some Stuff
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top