Question

I'm using .NET WebBrowser control. How do I know when a web page is fully loaded?

I want to know when the browser is not fetching any more data. (The moment when IE writes 'Done' in its status bar...).

Notes:

  • The DocumentComplete/NavigateComplete events might occur multiple times for a web site containing multiple frames.
  • The browser ready state doesn't solve the problem either.
  • I have tried checking the number of frames in the frame collection and then count the number of times I get DocumentComplete event but this doesn't work either.
  • this.WebBrowser.IsBusy doesn't work either. It is always 'false' when checking it in the Document Complete handler.
Was it helpful?

Solution 4

Here's what finally worked for me:

       public bool WebPageLoaded
    {
        get
        {
            if (this.WebBrowser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
                return false;

            if (this.HtmlDomDocument == null)
                return false;

            // iterate over all the Html elements. Find all frame elements and check their ready state
            foreach (IHTMLDOMNode node in this.HtmlDomDocument.all)
            {
                IHTMLFrameBase2 frame = node as IHTMLFrameBase2;
                if (frame != null)
                {
                    if (!frame.readyState.Equals("complete", StringComparison.OrdinalIgnoreCase))
                        return false;

                }
            }

            Debug.Print(this.Name + " - I think it's loaded");
            return true;
        }
    }

On each document complete event I run over all the html element and check all frames available (I know it can be optimized). For each frame I check its ready state. It's pretty reliable but just like jeffamaphone said I have already seen sites that triggered some internal refreshes. But the above code satisfies my needs.

Edit: every frame can contain frames within it so I think this code should be updated to recursively check the state of every frame.

OTHER TIPS

My approach to doing something when page is completely loaded (including frames) is something like this:

using System.Windows.Forms;
    protected delegate void Procedure();
    private void executeAfterLoadingComplete(Procedure doNext) {
        WebBrowserDocumentCompletedEventHandler handler = null;
        handler = delegate(object o, WebBrowserDocumentCompletedEventArgs e)
        {
            ie.DocumentCompleted -= handler;
            Timer timer = new Timer();
            EventHandler checker = delegate(object o1, EventArgs e1)
            {
                if (WebBrowserReadyState.Complete == ie.ReadyState)
                {
                    timer.Dispose();
                    doNext();
                }
            };
            timer.Tick += checker;
            timer.Interval = 200;
            timer.Start();
        };
        ie.DocumentCompleted += handler;
    }

From my other approaches I learned some "don't"-s:

  • don't try to bend the spoon ... ;-)
  • don't try to build elaborate construct using DocumentComplete, Frames, HtmlWindow.Load events. Your solution will be fragile if working at all.
  • don't use System.Timers.Timer instead of Windows.Forms.Timer, strange errors will begin to occur in strange places if you do, due to timer running on different thread that the rest of your app.
  • don't use just Timer without DocumentComplete because it may fire before your page even begins to load and will execute your code prematurely.

Here's how I solved the problem in my application:

private void wbPost_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (e.Url != wbPost.Url)
        return;
    /* Document now loaded */
}

Here's my tested version. Just make this your DocumentCompleted Event Handler and place the code that you only want be called once into the method OnWebpageReallyLoaded(). Effectively, this approach determines when the page has been stable for 200ms and then does its thing.

// event handler for when a document (or frame) has completed its download
Timer m_pageHasntChangedTimer = null;
private void webBrowser_DocumentCompleted( object sender, WebBrowserDocumentCompletedEventArgs e ) {
    // dynamic pages will often be loaded in parts e.g. multiple frames
    // need to check the page has remained static for a while before safely saying it is 'loaded'
    // use a timer to do this

    // destroy the old timer if it exists
    if ( m_pageHasntChangedTimer != null ) {
        m_pageHasntChangedTimer.Dispose();
    }

    // create a new timer which calls the 'OnWebpageReallyLoaded' method after 200ms
    // if additional frame or content is downloads in the meantime, this timer will be destroyed
    // and the process repeated
    m_pageHasntChangedTimer = new Timer();
    EventHandler checker = delegate( object o1, EventArgs e1 ) {
        // only if the page has been stable for 200ms already
        // check the official browser state flag, (euphemistically called) 'Ready'
        // and call our 'OnWebpageReallyLoaded' method
        if ( WebBrowserReadyState.Complete == webBrowser.ReadyState ) {
            m_pageHasntChangedTimer.Dispose();
            OnWebpageReallyLoaded();
        }
    };
    m_pageHasntChangedTimer.Tick += checker;
    m_pageHasntChangedTimer.Interval = 200;
    m_pageHasntChangedTimer.Start();
}

OnWebpageReallyLoaded() {
    /* place your harvester code here */
}

How about using javascript in each frame to set a flag when the frame is complete, and then have C# look at the flags?

I don't have an alternative for you, but I wonder if the IsBusy property being true during the Document Complete handler is because the handler is still running and therefore the WebBrowser control is technically still 'busy'.

The simplest solution would be to have a loop that executes every 100 ms or so until the IsBusy flag is reset (with a max execution time in case of errors). That of course assumes that IsBusy will not be set to false at any point during page loading.

If the Document Complete handler executes on another thread, you could use a lock to send your main thread to sleep and wake it up from the Document Complete thread. Then check the IsBusy flag, re-locking the main thread is its still true.

I'm not sure it'll work but try to add a JavaScript "onload" event on your frameset like that :

function everythingIsLoaded() { alert("everything is loaded"); }
var frameset = document.getElementById("idOfYourFrameset");
if (frameset.addEventListener)
    frameset.addEventListener('load',everythingIsLoaded,false); 
else
    frameset.attachEvent('onload',everythingIsLoaded); 

Can you use jQuery? Then you could easily bind frame ready events on the target frames. See this answer for directions. This blog post also has a discussion about it. Finally there is a plug-in that you could use.

The idea is that you count the number of frames in the web page using:

$("iframe").size()

and then you count how many times the iframe ready event has been fired.

You will get a BeforeNavigate and DocumentComplete event for the outer web page, as well as each frame. You know you're done when you get the DocumentComplete event for the outer webpage. You should be able to use the managed equivilent of IWebBrowser2::TopLevelContainer() to determine this.

Beware, however, the website itself can trigger more frame navigations anytime it wants, so you never know if a page is truly done forever. The best you can do is keep a count of all the BeforeNavigates you see and decrement the count when you get a DocumentComplete.

Edit: Here's the managed docs: TopLevelContainer.

I just use the webBrowser.StatusText method. When it says "Done" everything is loaded! Or am I missing something?

Checking for IE.readyState = READYSTATE_COMPLETE should work, but if that's not proving reliable for you and you literally want to know "the moment when IE writes 'Done' in its status bar", then you can do a loop until IE.StatusText contains "Done".

Have you tried WebBrowser.IsBusy property?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top