Question

Please forgive me, as I have limited knowledge of scraperwiki and twitter mining.

I have the following code to scrape twitter data. However, I want to edit the code to only give me results that are geotagged for New York on a particular date (let's say, April 1, 2013). Do you know how I should do this?

###############################################################################
# Twitter srcaper for the term 'hello'.
###############################################################################

import scraperwiki
import simplejson

# retrieve a page
base_url = 'http://search.twitter.com/search.json?q='
q = 'hello'
options = '&rpp=10&page='
page = 1

while 1:
    try:
        url = base_url + q + options + str(page)
        html = scraperwiki.scrape(url)
        #print html
        soup = simplejson.loads(html)
        for result in soup['results']:
            data = {}
            data['id'] = result['id']
            data['text'] = result['text']
            data['from_user'] = result['from_user']
            data['created_at'] = result['created_at']
            # save records to the datastore
            scraperwiki.datastore.save(["id"], data)
        page = page + 1
    except:
        print str(page) + ' pages scraped'
        break
Was it helpful?

Solution

In addition to q, use the query parameters geocode and until. See this page of the Twitter API documentation. Please note that you cannot use the Search API to find Tweets older than about a week.

Besides, it's easier to use urllib.urlencode() to construct your query, like for example

query_dict = {'q':'search term(s)', 'geocode':'37.781157,-122.398720,25mi', 'until':'2013-05-10'}
query = urllib.urlencode(query_dict)
response = urllib.urlopen(basic_url + query).read()

Update: Please see this example scraper that you can copy and adapt to your needs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top