Please forgive me, as I have limited knowledge of scraperwiki and twitter mining.

I have the following code to scrape twitter data. However, I want to edit the code to only give me results that are geotagged for New York on a particular date (let's say, April 1, 2013). Do you know how I should do this?

###############################################################################
# Twitter srcaper for the term 'hello'.
###############################################################################

import scraperwiki
import simplejson

# retrieve a page
base_url = 'http://search.twitter.com/search.json?q='
q = 'hello'
options = '&rpp=10&page='
page = 1

while 1:
    try:
        url = base_url + q + options + str(page)
        html = scraperwiki.scrape(url)
        #print html
        soup = simplejson.loads(html)
        for result in soup['results']:
            data = {}
            data['id'] = result['id']
            data['text'] = result['text']
            data['from_user'] = result['from_user']
            data['created_at'] = result['created_at']
            # save records to the datastore
            scraperwiki.datastore.save(["id"], data)
        page = page + 1
    except:
        print str(page) + ' pages scraped'
        break
有帮助吗?

解决方案

In addition to q, use the query parameters geocode and until. See this page of the Twitter API documentation. Please note that you cannot use the Search API to find Tweets older than about a week.

Besides, it's easier to use urllib.urlencode() to construct your query, like for example

query_dict = {'q':'search term(s)', 'geocode':'37.781157,-122.398720,25mi', 'until':'2013-05-10'}
query = urllib.urlencode(query_dict)
response = urllib.urlopen(basic_url + query).read()

Update: Please see this example scraper that you can copy and adapt to your needs.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top