Question

Look, people. We have a question about gevent.pool class and pool.wait_available() method, both code snippets

1.

def fetch(url):
    print 'start fetching...', url
    data = urllib2.urlopen(url)
    print url,':',data.code

urls = ['http://www.google.ru', 'http://www.s-str.ru', 'http://www.vk.com', 'http://www.yandex.ru', 'http://www.xxx.com']

pool = Pool(2)

def producer():
    for url in urls:
        pool.spawn(fetch, url)
    pool.join()

p = gevent.spawn(producer)
p.join()

2.

def fetch(url):
    print 'start fetching...', url
    data = urllib2.urlopen(url)
    print url,':',data.code

urls = ['http://www.google.ru', 'http://www.s-str.ru', 'http://www.vk.com', 'http://www.yandex.ru', 'http://www.xxx.com']

pool = Pool(2)

def producer():
    for url in urls:
        pool.wait_available()
        pool.spawn(fetch, url)
    pool.join()

p = gevent.spawn(producer)
p.join()

give us similar results:

start fetching... http://www.google.ru
start fetching... http://www.s-str.ru
http://www.google.ru : 200
start fetching... http://www.vk.com
http://www.s-str.ru : 200
start fetching... http://www.yandex.ru
http://www.yandex.ru : 200
start fetching... http://www.xxx.com
http://www.vk.com : 200
http://www.xxx.com : 200

Can anyone explain the meaning of wait_available() method? And possible cases of it's usage.

=======update======== I already monkey pathched it, it works correctly, all I want to know - is the difference between theese two code snippets.

Was it helpful?

Solution 2

Working with gevent you need to patch standard module before.

>>> import gevent.monkey
>>> gevent.monkey.patch_all()
>>> ...
>>> p = gevent.spawn(producer)
>>> p.join()
start fetching... http://www.google.ru
start fetching... http://www.s-str.ru
http://www.google.ru : 200
start fetching... http://www.vk.com
http://www.vk.com : 200
start fetching... http://www.yandex.ru
http://www.yandex.ru : 200
start fetching... http://www.xxx.com
http://www.xxx.com : 200
http://www.s-str.ru : 200

You can see, that pool.wait_available() works predictable.

Update

Pool works the same way only for spawn function (it will wait for available "slot" in pool). If you need to provide other functionality based on Pool state (logging, tracing, monitoring) - you definitely will use functions like wait_available, free_count etc. If you only need to spawn new green thread - you can rely on Pool implementation.

OTHER TIPS

TL;DR: wait_available isn't necessary if you're using spawn as the same check is run in both methods. However, if you're using apply_async and want to not submit threads above the cap of the pool, then you should call wait_available first.


For maybe a slightly more clear explanation.. There's a few ways to achieve the same thing with gevent's Pool class. Using spawn on the pool will block until there's a space available in the Pool to run a new greenlet. Here's a quick example:

import gevent.monkey
gevent.monkey.patch_all()
import gevent.pool
import time

def my_slow_function():
    time.sleep(5)

def log(text):
    print '%d : %s' % (int(time.time()), text)

if __name__ == '__main__':
    thread_pool = gevent.pool.Pool(5)
    for i in xrange(20):
        log('Submitting slow func %d' % i)
        thread_pool.spawn(my_slow_function)
    thread_pool.join()
    log('Exiting')

The output of this shows that it will spawn these in groups of 5, since the pool contains space for 5 greenlets:

1403037287 : Submitting slow func 0
1403037287 : Submitting slow func 1
1403037287 : Submitting slow func 2
1403037287 : Submitting slow func 3
1403037287 : Submitting slow func 4
1403037292 : Submitting slow func 5
1403037292 : Submitting slow func 6
1403037292 : Submitting slow func 7
1403037292 : Submitting slow func 8
1403037292 : Submitting slow func 9
1403037297 : Submitting slow func 10
1403037297 : Submitting slow func 11
1403037297 : Submitting slow func 12
1403037297 : Submitting slow func 13
1403037297 : Submitting slow func 14
1403037302 : Submitting slow func 15
1403037302 : Submitting slow func 16
1403037302 : Submitting slow func 17
1403037302 : Submitting slow func 18
1403037302 : Submitting slow func 19
1403037307 : Exiting

As you can see they are spawned in groups of 5 roughly 5 seconds apart. If you dig into the gevent code and look at the Pool object, you can see that calling spawn will ask for a lock in the Pools internal semaphore that's used to track running greenlets.

Conversely, if you try this same code using apply_async instead of spawn, it will force all the calls to run at the same time:

1403037313 : Submitting slow func 0
1403037313 : Submitting slow func 1
1403037313 : Submitting slow func 2
1403037313 : Submitting slow func 3
1403037313 : Submitting slow func 4
1403037313 : Submitting slow func 5
1403037313 : Submitting slow func 6
1403037313 : Submitting slow func 7
1403037313 : Submitting slow func 8
1403037313 : Submitting slow func 9
1403037313 : Submitting slow func 10
1403037313 : Submitting slow func 11
1403037313 : Submitting slow func 12
1403037313 : Submitting slow func 13
1403037313 : Submitting slow func 14
1403037313 : Submitting slow func 15
1403037313 : Submitting slow func 16
1403037313 : Submitting slow func 17
1403037313 : Submitting slow func 18
1403037313 : Submitting slow func 19
1403037318 : Exiting

You can see here that there's no blocking or waiting, they're all shoved in at the same time. However, if you throw in a wait_available() at the beginning of the for loop, you go back to having a similar behavior to spawn.

1403038292 : Submitting slow func 0
1403038292 : Submitting slow func 1
1403038292 : Submitting slow func 2
1403038292 : Submitting slow func 3
1403038292 : Submitting slow func 4
1403038297 : Submitting slow func 5
1403038297 : Submitting slow func 6
1403038297 : Submitting slow func 7
1403038297 : Submitting slow func 8
1403038297 : Submitting slow func 9
1403038302 : Submitting slow func 10
1403038302 : Submitting slow func 11
1403038302 : Submitting slow func 12
1403038302 : Submitting slow func 13
1403038302 : Submitting slow func 14
1403038307 : Submitting slow func 15
1403038307 : Submitting slow func 16
1403038307 : Submitting slow func 17
1403038307 : Submitting slow func 18
1403038307 : Submitting slow func 19
1403038312 : Exiting

Once again, looking at the source in gevent, wait_available does the same check that happens as a result of calling spawn, which is checking the semaphore to see if there's actually room in the pool.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top