Question

I am writing a Python program that feeds a search term to google using the google search API and downloads the first 10 results. I was able to do this in Python 2.6 as follows:

query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
             % (query)
results = urllib.urlopen(url)
resultsjson = json.loads(results.read())
betterResults += resultsjson["responseData"]["results"]

Google's search API returns the results as a json, so I used the above code to download the results into a json of my and parse them into a list (betterResults).

When I switched over to Python 3, my program began throwing exceptions. Apparently, in Python 2.6 the object returned by urlopen() is a file-like object that can be loaded into a json. In Python 3.1, the object returned is an HTTPResponse object, which does contain a read() method, as required by the json specifications, but is a byte object. I was therefore unable to access the information as I had in 2.6.

Is there any way to access the json returned by google? How can I get the results in Python 3 and be able to select which fields I want, as I was able to do with the json?

Thank you very much, bsg

Was it helpful?

Solution

The object returned by urlopen is file like, you are wrong there. But you use json.loads(), which expects a string. json.load() expects a file like object.

However, json.load() expects the result of the read() method to be a string, while of course the read you get will be bytes, so you need to decode it from bytes to a string first.

So, something like this:

query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
             % (query)
results = urllib.urlopen(url)
encoding = input.getheader('content-type').split('=')[-1]
resultsjson = json.loads(results.read().decode(encoding))
betterResults += resultsjson["responseData"]["results"]

Might work. (I didn't test it).

OTHER TIPS

You'll need to decode the byte object if you want to use it with json.loads

resultjson =  json.loads(results.read().decode())

docs also suggest to pass encoding parameter to the loads function:

json.loads(results.read(), encoding=<encoding-type>)

I think Lennart has an explanation how to get the encoding-type.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top