Pergunta

I looked here and here for information on my issue, but with no luck.

I made some python code that is intended to grab a webpage's source, as in Safari's Web Inspector. However, I have been getting different code from my application and Safari's Web Inspector. Here is my code so far:

#!/usr/bin/python

import urllib2

# headers

hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Cache-Control': 'max-age=0'}

# request data

req = urllib2.Request("https://www.google.com/#q=rainbow&safe=active", headers=hdr)

# try to get data
try:
    page = urllib2.urlopen(req)
    print page.info()
except urllib2.HTTPError, e:
    print e.fp.read()


content = page.read()

#print content

print content 

And the headers match up to what is in Web Inspector:

Web Inspector


The code returned is different, though, for a google search for "rainbow".

My python:

http://paste.ubuntu.com/6270549/

Web Inspector:

http://paste.ubuntu.com/6270606/

As far as I know, it seems that my code is missing a large number of the ubiquitous }catch(e){gbar_._DumpException(e)} lines that are present in the Web Inspector code. Also, my code only has 78 lines, while the Web Inspector code has 235 lines. Does this mean that my code is not getting all of the javascript or some other portion of the webpage? How can I get my code retrieve the same data as the Web Inspector?

Foi útil?

Solução

You are using the wrong link to search with google search- the correct link should be:

https://www.google.com/search?q=rainbow&safe=active

instead of:

https://www.google.com/#q=rainbow&safe=active

The second link will cause a redirect to Google's homepage when used in python, because it is incorrect (for some reason) when not used in Safari. This is why the code is different.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top