Question

Every month I run my script to download Google Trends CSV files for my research, and I have about 1000 search terms. I am aware of the famous "You've reached your quota limit, please try again later" error message hence I used the Mechanize python module in order to provide cookies. Here is my call:

 import mechanize # other imports omitted for simplicity 

 br = mechanize.Browser()

 cj = cookielib.LWPCookieJar()
 br.set_cookiejar(cj)

 br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

 response = br.open('https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/')
 forms = mechanize.ParseResponse(response)
 form = forms[0]
 form['Email'] = username
 form['Passwd'] = password
 response = br.open(form.click())

 Result = br.open("http://www.google.com/trends/trendsReport?q=SearchTerm&export=1") 
 CSVcontent = csv.reader(StringIO(Result.read()))

In my actual script, I have a sleep time specification which varies from 1 to 5 seconds.

However, with these methods, I can only download around 350 CSV files at a time. After the run, even if I re-run my script on a different AWS instance (different region, with the same Google account credentials), I can only download 2 terms before my quota limit runs out.

I've also tried to re-run the script on the same instance, with used but different Google credentials, but same thing happened (2 CSV downloads only).

The worst thing is that, even after a month, I can still only download 2 CSV files each day with Google credentials that I used in my script a (couple of) month(s) ago.

Hence I guess Google is monitoring user downloads on both IP address level as well as user account level, and they flag accounts that download too much.

My Questions:

Is there anyone can tell me whether there is a paid service or API that is quota limit free?

Or is there a way to bypass Google's monitoring (by using Tor or other methods)? But even with Tor I'll still need Google account credential right?

Is there a way to download Google Trends data without a username/password (so I can auto-scale with AWS as each IP address can at least download 2 CSV files)?

Any help and out-of-the-box thoughts would be greatly appreciated.

Was it helpful?

Solution

Is there a way to download Google Trends data without a username/password

Yes, there is a way to get your data without having to login (I am in testing on it now and it is looking good but it's also a secret I can not share as I do not what to loose it).

I can share that I downloaded between 3,000 to 5,000 files over the weekends almost the same way you do it, and I do it every weekend. ( 2 IP's , 2 ID's)

Ihe only secret i can give is do not hammer Google. If you hammer them there system will cut you off

I found a 1.5 to 2 min break will help you.

Look at it this way: 24 hours in a day, that's 1440 mins. If you pull a file every 2 mins that's 720 files a day.

And they wont cut you off also if you put a random time function in, they can not tell if it is you or a script.

only download 2 CSV files each day with Google credentials that I used in my script a (couple of) month(s) ago

It will reset if you do not use the account for a week or so; also, if you can renew you ip that will help.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top