Using VBA in Excel to Google Search in IE and return the hyperlink of the first result

StackOverflow https://stackoverflow.com/questions/17495644

  •  02-06-2022
  •  | 
  •  

سؤال

I have been attempting to use IE automation to google search a string of text in Excel. I want to return the hyperlink for the website of the first result in another cell in excel. Is this possible? I have a list of 60,000 records that I need to google search and return the hyperlink for the website in the first result. Is there another approach to this that you would reccomend? I appreciate the help in advance.

هل كانت مفيدة؟

المحلول

As its 60,000 records i recommend use xmlHTTP object instead of using IE.
HTTP requests a easier, and a lot faster

Download the sample file

Sub XMLHTTP()

    Dim url As String, lastRow As Long, i As Long
    Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
    Dim start_time As Date
    Dim end_time As Date

    lastRow = Range("A" & Rows.Count).End(xlUp).Row

    Dim cookie As String
    Dim result_cookie As String

    start_time = Time
    Debug.Print "start_time:" & start_time

    For i = 2 To lastRow

        url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.ResponseText
        Set objResultDiv = html.getelementbyid("rso")

        Set objH3 = objResultDiv.getelementsbytagname("h3")


        For Each link In objH3

            If link.className = "r" Then

                Cells(i, 2) = link.innerText
                Cells(i, 3) = link.getelementsbytagname("a")(0).href
                DoEvents
            End If
        Next
    Next

    end_time = Time
    Debug.Print "end_time:" & end_time

    Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time) & " :minutes"
    MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

Using CSS3 Selector

 Sub XMLHTTP1()

        Dim url As String, i As Long, lastRow As Long
        Dim XMLHTTP As Object, html As New HTMLDocument, objResultDiv As HTMLAnchorElement


        lastRow = Range("A" & Rows.Count).End(xlUp).Row
        For i = 2 To lastRow

            url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
            XMLHTTP.Open "GET", url, False
            XMLHTTP.setRequestHeader "Content-Type", "text/xml"
            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
            XMLHTTP.send

            Set html = New HTMLDocument
            html.body.innerHTML = XMLHTTP.ResponseText
            Set objResultDiv = html.querySelector("div#rso h3.r a")

            Cells(i, 2) = objResultDiv.innerText
            Cells(i, 3) = objResultDiv.href

            DoEvents
        Next

    End Sub

Output

enter image description here

HTH
Santosh

نصائح أخرى

The links seem to be consistently within within H3 tags. Normally you might use something like the following to check until the page has loaded:

Private Declare Sub Sleep Lib "kernel32" (ByVal nMilliseconds As Long)

Sub UseIE()
    Dim ie As Object
    Dim thePage As Object
    Dim strTextOfPage As String

    Set ie = CreateObject("InternetExplorer.Application")
    'ie.FullScreen = True
    With ie
        '.Visible = True
        .Navigate "http://www.bbc.co.uk"
        While Not .ReadyState = READYSTATE_COMPLETE '4
            Sleep 500      'wait 1/2 sec before trying again
        Wend
    End With

    Set thePage = ie.Document
    'more code here
End Sub

However, I would, instead, repeatedly try to reference the A element within the first H3 using getElementsByTagName("H3"), get the first of these elements, then look within this for the A-link and its href-attribute.

In JavaScript the attempts to reference non-existent elements would return undefined but from VBA it will probably need error-handling code.

Once I had obtained the href I would stop the navigation (not sure of the command for this, probably ie.Stop) or navigate to the next page immediately.

The first link(s) will, however, often be sponsored links and the href returned is a little garbled. The text of these sponsored links appear to include em tags. I might use this information to discard these links and look further down the page.

I don't know if there is a better way to do this.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top