How does map() and imap() work in gevent.pool.Pool?

https://stackoverflow.com/questions/20426942

29-08-2022
|

Question

I wrote a script to learn about gevent.pool.Pool, but I have seen a strange phonemenon.

In my code, I have three different code segments, named version 1, version 2 and version 3.

When commenting out version 2 and version 3, namely just using the imap() method in version 1, then there's nothing to happen.
When commenting out version 1 and version 3, namely just using the map() method in version 2, then I find that the first map() method creates two greenlets and then the two greenlets execute. After those two greenles are done, then the second map() method does the same thing.
When commenting out version 1 and version 2, namely first using the imap() method and then using map() method in version 3, I find that until the map() method has been executed, then the five greenlets are created and executed.

So I have two questions:

Why does map() method triggers the execution while imap() not?
Why does the Pool instance have a non-zero length after map() triggers the execution?

I have read the source code of pool.py in gevent-1.0, but I don't understand how the source code adds greenlets to the variable self.greenlets and the difference between map() and imap(). In my option, imap() just returns an iterable objects and map() returns a list of greenlets generated by imap().

Here is the source code of map() and imap() in pool.py of gevent:

def map(self, func, iterable):
    return list(self.imap(func, iterable))

def imap(self, func, iterable):
    """An equivalent of itertools.imap()"""
    return IMap.spawn(func, iterable, spawn=self.spawn)

Here is my test code:

#!/usr/bin/env python2.7
#coding: utf-8

import gevent
from gevent.pool import Pool
from gevent.coros import BoundedSemaphore


class TestSemaphore(object):

    def __init__(self):
        self.sem = BoundedSemaphore(1)
        self.pool = Pool()

    def run(self):
        # version 1 
        self.pool.imap(self._worker, xrange(0, 2))
        self.pool.imap(self._worker, xrange(3, 6))
        # end of version 1

        # version 2
        # self.pool.map(self._worker, xrange(0, 2))
        # self.pool.map(self._worker, xrange(3, 6))
        # end of version 2

        # version 3
        # self.pool.imap(self._worker, xrange(0, 2))
        # self.pool.map(self._worker, xrange(3, 6))
        # end of version 3

    def _worker(self, pid):
        with self.sem:
            print('worker %d acquired semaphore, length of pool is %d' % (pid, len(self.pool)))
            gevent.sleep(0)
        print('worker %d released semaphore, length of pool is %d' % (pid, len(self.pool))) 

if __name__ == '__main__':
    test = TestSemaphore()
    test.run()

Solution

The key thing to note is that imap is lazy - it does not do any work until you actually consume the resulting iterator:

>>> map(lamda x: x, xrange(0, 2))
[0, 1]

>>> from itertools import imap
>>> imap(lamda x: x, xrange(0, 2))
<generator object at 0xsome-address>

# Consume the resulting iterator
>>> list(imap(lamda x: x, xrange(0, 2)))
[0, 1]

imap in multiprocessing and gevent adheres to the same rules.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow