Question

I'm very new to twitter api, and was wondering if I use search api, and I want to call it every minute, to retrieve about a 1000 tweets. Will I get duplicate tweets if in case there were created less than a 1000 tweets for a given criteria or I will call it more often than once a minute

I hope my question is clear, just in case if it matters I use python-twitter library. and the way I get tweets is :

self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
self.api.VerifyCredentials()
self.api.GetSearch(self.hashtag, per_page=100)
Was it helpful?

Solution

Your search results will overlap because the API has no idea what you searched before. One way to prevent the overlap is to use use the tweet ID from the last retrieved tweet. Here is a python 2.7 snippet from my code:

maxid = 10000000000000000000
for i in range(0,10):
    with open('output.json','a') as outfile: 
        time.sleep(5) # don't piss off twitter
        print 'maxid=',maxid,', twitter loop',i
        results = api.GetSearch('search_term', count=100,max_id = maxid)
        for tweet in results:
            tweet = str(tweet).replace('\n',' ').replace('\r',' ') # remove new lines
            tweet = (json.loads(tweet))
            maxid = tweet['id'] # redefine maxid
            json.dump(tweet,outfile)
            outfile.write('\n') #print tweets on new lines

This code gives you 10 loops of 100 tweets since the last id, which is defined each time through the loop. It then write a json file (with one tweet per line). I use this code to search into the recent past, but you can adapt it to have non-overlapping tweets by changing the 'max_id' to 'since_id'.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top