Question

Function bigop(init, report) operates on a large dynamic internal data structure data derived from init, and accepts a callable report(data). Function status(data) returns a summary of the current state of data.

Function bigop calls report on the current state of data at each major step, which in turn calls status. It would be expensive to copy data for each step (or make it persistent), so report must finish at each step before bigop continues.

Function view(gen) accepts a generator gen yielding successive values of the status summary, and displays a visualization of each value as it is generated. The function view maintains internal state dependent on values generated thus far. (In my particular case, this internal state could be copied, but it would be nice to avoid.)

Assume that functions bigop and view cannot be changed.

Question: how can one define gen, report, and a program main such that bigop will run on init, and visualizations of the values of the status reports will be displayed as bigop reaches each major step?

The difficulty lies in that report and gen are called inside other functions, so the usual Python coroutine patterns are inapplicable. (In my particular case, bigop is actually a generator.)

A previous question about using callbacks to produce generators from ordinary functions was answered using threads, but I am wondering if there is a simpler way.

Note: only answers compatible with Python 2.7 will be useful to me; but I would be interested to see Python 3 answers if the differences are relevant.

def bigop(init, report):
    data = init
    while data < 10:           # complicated condition
        print 'working ...'
        data += 1              # complicated operation
        report(data)

def view(gen):
    for value in gen:
        print value            # complicated display routine

def main(init):
    """
    example:

    >> main(7)
    'working ...'
    8
    'working ...'
    9
    'working ...'
    10
    """
    pass

Question: how to define main?

Was it helpful?

Solution

Given your sample code:

def main(init):
    def report(x):
        print x
    bigop(init, report)

However, I don't think that's what you're looking for here. Presumably you want report to feed data into view in some way.

You can do that by turning things around—instead of view being a generator that drives another generator, it's a generator that's driven by an outside caller calling send on it. For example:

def view():
    while True:
        value = yield
        print value
def main(init):
    v = view()
    v.next()
    def report(x):
        v.send(x)
    bigop(init, report)

But you said that view can't be changed. Of course you can write a viewdriver that yields a new object whenever you send it one. Or, more simply, just repeatedly call view([data]) and let it iterate over a single object.

Anyway, I don't see how you expect this to help anything. bigop is not a coroutine, and you cannot turn it into one. Given that, there's no way to force it to cooperatively share with other coroutines.

If you want to interleave processing and reporting concurrently, you have to use threads (or processes). And the fact that "REPORT must finish at each step before BIGOP continues" is already part of your requirements implies that you can't safely do anything concurrent here anyway, so I'm not sure what you're looking for.

If you just want to interleave processing and reporting without concurrency—or periodically hook into bigop, or other similar things—you can do that with a coroutine, but it will have exactly the same effect as using a subroutine—the two examples above are pretty much equivalent. So, you're just adding complexity for no reason.

(If bigop is I/O bound, you could use greenlets, and monkeypatch the I/O operations to asyncify them, as gevent and eventlet do. But if it's CPU-bound, there would be no benefit to doing so.)


Elaborating on the viewdriver idea: What I was describing above was equivalent to calling view([data]) each time, so it won't help you. If you want to make it an iterator, you can, but it's just going to lead to either blocking bigop or spinning view, because you're trying to feed a consumer with a consumer.

It may be hard to understand as a generator, so let's build it as a class:

class Reporter(object):
    def __init__(self):
        self.data_queue = []
        self.viewer = view(self)
    def __call__(self, data):
        self.data_queue.append(data)
    def __iter__(self):
        return self
    def __next__(self):
        return self.data_queue.pop()

bigop(init, Reporter())

Every time bigop calls report(data), that calls our __call__, adding a new element to our queue. Every time view goes through the loop, it calls our __next__, popping an element off the queue. If bigop is guaranteed to go faster than view, everything will work, but the first time view gets ahead, it will get an IndexError.

The only way to fix that is to make __next__ try until data_queue is non-empty. But just doing that will spin forever, not letting bigop do the work to produce a new element. And you can't make __next__ into a generator, because view is expecting an iterator over values, not an iterator over iterators.

Fortunately, __call__ can be a generator, because bigop doesn't care what value it gets back. So, you can turn things around. But you can't do that, because then there's nothing to drive that generator.

So, you have to add another level of coroutines, underneath the iteration. Then, __next__ can wait on a next_coro (by calling next on it), which yields to a call_coro and then yields the value it got. Meanwhile, __call__ has to send to the same call_coro, wait on it, and yield.

So far, that doesn't change anything, because you've got two routines both trying to drive next_coro, and one of them (__next__) isn't blocking anywhere else, so it's just going to spin—it's next call will look like a send(None) from __call__.

The only way to fix that is to build a trampoline (PEP 342 includes source for a general-purpose trampoline, although in this case you could build a simpler special-purpose one), schedule next_coro and call_coro to explicitly alternate, make sure next_coro properly handles alternating between two different entry points, then drive the scheduler's run from __next__ (and __init__).

Confused? You won't be, after this week's episode of… Nah, who am I kidding. You're going to be confused. Writing all of this is one thing; debugging it is another. (Especially since every important stack trace just terminates immediately at the trampoline.) And what does all that work get you? The exact same benefit as using greenlets or threads, with the exact same downsides.

Since your original question is whether there's a simpler way than using threads, the answer is: No, there isn't.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top