Question

I'm building a website using pyramid, and I want to fetch some data from other websites. Because there may be 50+ calls of urlopen, I wanted to use gevent to speed things up.

Here's what I've got so far using gevent:

import urllib2    
from gevent import monkey; monkey.patch_all()
from gevent import pool

gpool = gevent.pool.Pool()

def load_page(url):
    response = urllib2.urlopen(url)
    html = response.read()
    response.close()
    return html

def load_pages(urls):
    return gpool.map(load_page, urls)

Running pserve development.ini --reload gives:

NotImplementedError: gevent is only usable from a single thread.

I've read that I need to monkey patch before anything else, but I'm not sure where the right place is for that. Also, is this a pserve-specific issue? Will I need to re-solve this problem when I move to mod_wsgi? Or is there a way to handle this use-case (just urlopen) without gevent? I've seen suggestions for requests but I couldn't find an example of fetching multiple pages in the docs.

Update 1:

I also tried eventlet from this SO question (almost directly copied from this eventlet example):

import eventlet
from eventlet.green import urllib2

def fetch(url):
    return urllib2.urlopen(url).read()

def fetch_multiple(urls):
    pool = eventlet.GreenPool()
    return pool.imap(fetch, urls)

However when I call fetch_multiple, I'm getting TypeError: request() got an unexpected keyword argument 'return_response'

Update 2:

The TypeError from the previous update was likely from earlier attempts to monkeypatch with gevent and not properly restarting pserve. Once I restarted everything, it works properly. Lesson learned.

Was it helpful?

Solution

There are multiple ways to do what you want:

  • Create a dedicated gevent thread, and explicitly dispatch all of your URL-opening jobs to that thread, which will then do the gevented urlopen requests.
  • Use threads instead of greenlets. Running 50 threads isn't going to tax any modern OS.
  • Use a thread pool and a queue. There's usually not much advantage to doing 50 downloads at the same time instead of, say, 8 at a time (as your browser probably does).
  • Use a different async framework instead of gevent, one that doesn't work by magically greenletifying your code.
  • Use a library that has its own non-magic async support, like pycurl.
  • Instead of mixing and matching incompatible frameworks, build the server around gevent too, or find some other framework that works for both your web-serving and your web-client needs.

You could simulate the last one without changing frameworks by loading gevent first, and have it monkeypatch your threads, forcing your existing threaded server framework to become a gevent server. But this may not work, or mostly work but occasionally fail, or work but be much slower… Really, using a framework designed to be gevent-friendly (or at least greenlet-friendly) is a much better idea, if that's the way you want to go.

You mentioned that others had recommended requests. The reason you can't find the documentation is that the built-in async code in requests was removed. See, an older version for how it was used. It's now available as a separate library, grequests. However, it works by implicitly wrapping requests with gevent, so it will have exactly the same issues as doing so yourself.

(There are other reasons to use requests instead of urllib2, and if you want to gevent it it's easier to use grequests than to do it yourself.)

OTHER TIPS

I've had similar problems with gevent when trying to deploy a web application. The thing you could do that would take the least hassle is to use a WSGI deployment that runs on gevent; examples include gUnicorn, uWSGI, or one of gevent's built-in WSGI servers. Pyramid should have a way of using an alternate deployment. If large portions of your code rely on gevent, it's easier to just use a server that runs on gevent as well.

So, basically the last bullet on the above answer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top