Concurrent control of generator and subroutine in Python

Question

Given your sample code:

def main(init):
    def report(x):
        print x
    bigop(init, report)

However, I don't think that's what you're looking for here. Presumably you want report to feed data into view in some way.

You can do that by turning things around—instead of view being a generator that drives another generator, it's a generator that's driven by an outside caller calling send on it. For example:

def view():
    while True:
        value = yield
        print value
def main(init):
    v = view()
    v.next()
    def report(x):
        v.send(x)
    bigop(init, report)

But you said that view can't be changed. Of course you can write a viewdriver that yields a new object whenever you send it one. Or, more simply, just repeatedly call view([data]) and let it iterate over a single object.

Anyway, I don't see how you expect this to help anything. bigop is not a coroutine, and you cannot turn it into one. Given that, there's no way to force it to cooperatively share with other coroutines.

If you want to interleave processing and reporting concurrently, you have to use threads (or processes). And the fact that "REPORT must finish at each step before BIGOP continues" is already part of your requirements implies that you can't safely do anything concurrent here anyway, so I'm not sure what you're looking for.

If you just want to interleave processing and reporting without concurrency—or periodically hook into bigop, or other similar things—you can do that with a coroutine, but it will have exactly the same effect as using a subroutine—the two examples above are pretty much equivalent. So, you're just adding complexity for no reason.

(If bigop is I/O bound, you could use greenlets, and monkeypatch the I/O operations to asyncify them, as gevent and eventlet do. But if it's CPU-bound, there would be no benefit to doing so.)

Elaborating on the viewdriver idea: What I was describing above was equivalent to calling view([data]) each time, so it won't help you. If you want to make it an iterator, you can, but it's just going to lead to either blocking bigop or spinning view, because you're trying to feed a consumer with a consumer.

It may be hard to understand as a generator, so let's build it as a class:

class Reporter(object):
    def __init__(self):
        self.data_queue = []
        self.viewer = view(self)
    def __call__(self, data):
        self.data_queue.append(data)
    def __iter__(self):
        return self
    def __next__(self):
        return self.data_queue.pop()

bigop(init, Reporter())

Every time bigop calls report(data), that calls our __call__, adding a new element to our queue. Every time view goes through the loop, it calls our __next__, popping an element off the queue. If bigop is guaranteed to go faster than view, everything will work, but the first time view gets ahead, it will get an IndexError.

The only way to fix that is to make __next__ try until data_queue is non-empty. But just doing that will spin forever, not letting bigop do the work to produce a new element. And you can't make __next__ into a generator, because view is expecting an iterator over values, not an iterator over iterators.

Fortunately, __call__ can be a generator, because bigop doesn't care what value it gets back. So, you can turn things around. But you can't do that, because then there's nothing to drive that generator.

So, you have to add another level of coroutines, underneath the iteration. Then, __next__ can wait on a next_coro (by calling next on it), which yields to a call_coro and then yields the value it got. Meanwhile, __call__ has to send to the same call_coro, wait on it, and yield.

So far, that doesn't change anything, because you've got two routines both trying to drive next_coro, and one of them (__next__) isn't blocking anywhere else, so it's just going to spin—it's next call will look like a send(None) from __call__.

The only way to fix that is to build a trampoline (PEP 342 includes source for a general-purpose trampoline, although in this case you could build a simpler special-purpose one), schedule next_coro and call_coro to explicitly alternate, make sure next_coro properly handles alternating between two different entry points, then drive the scheduler's run from __next__ (and __init__).

Confused? You won't be, after this week's episode of… Nah, who am I kidding. You're going to be confused. Writing all of this is one thing; debugging it is another. (Especially since every important stack trace just terminates immediately at the trampoline.) And what does all that work get you? The exact same benefit as using greenlets or threads, with the exact same downsides.

Since your original question is whether there's a simpler way than using threads, the answer is: No, there isn't.