Retrieving all "Yahoo Answers" Questions that contain a certain word . Issue with rate limiting

StackOverflow https://stackoverflow.com/questions/15278428

  •  18-03-2022
  •  | 
  •  

Question

So, I'm trying to compile a database of all the questions in Yahoo Answers that contain a certain word. I am currently doing this with the following script I wrote, using Pynswers wrapper class for calling Yahoo API

from Answers import Answers

app = Answers()
wbk = xlwt.Workbook()


sheet = wbk.add_sheet('sheet 1')

app.appid = '...'
questions = app.questionSearch({'query':'tornado',})

#Write all column headings
sheet.write(0,0, 'Question')
sheet.write(0,1,'Answer')
sheet.write(0,2, 'Date')
sheet.write(0,3,'Number of Answers')


for i, value in enumerate(questions):
        content = value['Content'].strip()
        chosenAnswer = value['ChosenAnswer'].strip()
        date = value['Date'].strip()
        numAnswers = value['NumAnswers'].strip()

        #Write values into respect columns, (row, column)
        sheet.write(i+1,0,content)
        sheet.write(i+1,1,chosenAnswer)
        sheet.write(i+1,2,date)
        sheet.write(i+1,3,numAnswers)


wbk.save('C://test.xls')

The issue is that I only get about 10 responses from this query, and I can't figure out a way to expand the range of questions that I get. Any ideas?

Était-ce utile?

La solution

Pynswers appears to be a very loose wrapper around the Yahoo API itself. The API documentation shows to use a "start" and "results" field in the request:

So, perhaps you can do the following:

first_50 = app.questionSearch({'query':'tornado', 'start' : 0, 'results' : 50})
next_50 = app.questionSearch({'query':'tornado', 'start' : 50, 'results' : 50})

Edit

Also, in regard to "rate limiting", Yahoo states in regard to their API (this section taken on March 7, 2013):

How many times can I call YQL in a minute/hour/day?

Rate limits in YQL are based on your authentication. If you use IP-based authentication, then you are limited to 2,000 calls/hour/IP to the public YQL Web service URL (/v1/public/) or 20,000 calls/hour/IP to the private YQL Web service URL (/v1/yql/) that requires OAuth authorization. See the YQL Web Service URLs for the public and private URLs. Applications (identified by an Access Key) are limited to 100,000 calls/day/key*. However, in order to make sure the service is available for everyone we ask that you don't call YQL more than 0.2 times/second or 1,000 times/hour for IP authenticated users and 2.7 times/second or 10,000 times/hour.

*Please don't create multiple keys to 'avoid' rate limits. If you would like us to increase your limit please contact us with details of your project and we'll do our best to accommodate you.

Obviously, you'll need to be careful with your code to ensure that you're getting the information you need without exceeding the rate limit. So, getting "all" the answers might not be practical.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top