Question

I am trying to build a simple dataset using Google, mainly because it seems like the best option for what I want.

I want to measure fame for a large group of scientists. The quick method is to measure the amount of Google results when searching their name. I do not care about the results, only the number of them. That method has it's flaws I know, so I am not opposed to an alternative.

My scientist data is composed of thousands of entries. Which is causing issues. I tried to programmatically search Google, but less then 1000 searches later they blocked the program. I also looked into their Search API, but that is limited to 100 searches a day unless I pay for more, but since I am a poor college student that isn't an option.

I was hoping someone here may be able to offer suggestions on build a dataset with some way of measuring fame.

Was it helpful?

Solution

With any search engine you will be limited by number of requests and any way of outcoming those limits will be a gray zone of violation of end user agreement (and, eventually, you will get banned for some time, of course). You should be looking into Search APIs of known search engines, for example, Bing gives you 5000 searches per month for free which - for a proof of concept research - might be enough. Also, 5k/month will give you some 20-30k until summer, so your data set will become bigger while you will be polishing your idea.

Also, Google's free tier search is limited to 100 requests per day. Which gives you completely legal 3k per month as well. Combined (given, you treat Google and Bing results as equal) you get 8k per month.

OTHER TIPS

The Bing Search API mentions that it not be available after Dec 31st 2016.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top