Question

I have seen several threads on StackOverflow concerning this topic, however none of them seem to provide an answer.

I have a button that, when clicked, opens up an invisible web page, navigates to a URL, enters information into a box, presses a button, and then scrapes the screen for information.

The bones of my code basically in the click:

WebBrowser wb = new WebBrowser;
wb.Visibility = System.Windows.Visibility.Hidden;
wb.Navigate("http://somepage.com");

And this is where it gets tricky.

I am looking for a way to ensure that the page is loaded before trying to enter data or scrape the screen. I have seen several threads that talk about Navigated, IsLoaded, LoadCompleted as well as BackgroundWork stuff, but I cannot get any of these to work.

Which is the best option to use to determine that the page has fully loaded? How would you get the chosen method to work?

I also cannot get the data from the screen as WPF does not use the same GetElementByID.

Edit:

Per the comment below, here are the errors I run into:

  • Navigated first as soon as the page has been navigated too and does not necessarily wait until all objects are loaded. This works as intended, but cannot be used for my purposes.
  • IsLoaded never returns true

    private void GetData_Click(object sender, RoutedEventArgs e)
    {
      int x=0;
      HTMLDocument doc;
    
      wb = new WebBrowser();
      wb.Visibility = System.Windows.Visibility.Visible;
      wb.Navigate("somesite.com");
    
      doc = wb.Document as mshtml.HTMLDocument;
    
      while(!wb.IsLoaded)
      {
        //Wait
      }
    
      doc.getElementById("txt_One").innerText = "It Worked";
    
    }
    

Puts it in an infinite loop as wb does not ever seem to load.

  • This is the version with LoadCompleted

The event 'System.Windows.Controls.WebBrowser.LoadCompleted' can only appear on the left hand side of += or -=

    private void GetData_Click(object sender, RoutedEventArgs e)
    {
      int x=0;
      HTMLDocument doc;

      wb = new WebBrowser();
      wb.Visibility = System.Windows.Visibility.Visible;
      wb.Navigate("somesite.com");

      doc = wb.Document as mshtml.HTMLDocument;

      wb.LoadCompleted += wb_LoadCompleted;

      doc.getElementById("txt_One").innerText = "It Worked";

    }

    void wb_LoadCompleted(object sender, NavigationEventArgs e)
    {

    }

Produces the error

An unhandled exception of type 'System.NullReferenceException' occured in {filename}

Additional information: Object reference not set to an instance of an object.

Was it helpful?

Solution

The webbrowser control has a loadedevent (which you have): LoadCompleted: fires when the dom is fully loaded.

Bind the event and in the event method get the document instead of right away.

    //root is a grid element identified in the XAML
    public WebBrowser webb;

    public MainWindow()
    {
        InitializeComponent();

        webb = new WebBrowser();
        webb.Visibility = System.Windows.Visibility.Hidden;
        root.Children.Add(webb);
        webb.LoadCompleted += webb_LoadCompleted;
        webb.Navigate("http://www.google.com");
    }

    void webb_LoadCompleted(object sender, NavigationEventArgs e)
    {
        MessageBox.Show("Completed loading the page");

        mshtml.HTMLDocument doc = webb.Document as mshtml.HTMLDocument;
        mshtml.HTMLInputElement obj = doc.getElementById("gs_taif0") as mshtml.HTMLInputElement;
        mshtml.HTMLFormElement form = doc.forms.item(Type.Missing, 0) as mshtml.HTMLFormElement;

        webb.LoadCompleted -= webb_LoadCompleted; //REMOVE THE OLD EVENT METHOD BINDING
        webb.LoadCompleted += webb_LoadCompleted2; //BIND TO A NEW METHOD FOR THE EVENT
        obj.value = "test search";
        form.submit(); //PERFORM THE POST ON THE FORM OR SEARCH
    }

    //SECOND EVENT TO FIRE AFTER YOU POST INFORMATION
    void webb_LoadCompleted2(object sender, NavigationEventArgs e)
    {
        MessageBox.Show("Completed loading the page second time after post"); 
    }

You need to do doc = wb.Document as mshtml.HTMLDocument; in the loadcompleted event. Because until the load is complete you cannot get the document.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top