Question

UPDATE: example now lists desired results (boldfaced below)

I find myself writing lots of functions that search through some data, where I want to let the caller specify behaviours when matches are found: they might print something out or add it to one of their data structures, but it's also highly desirable to be able to optionally return found data for further transmission, storage or processing.

Example

def find_stuff(visitor):    # library search function
    for x in (1, 2, 3, 4, 5, 6):
        visitor(x)

First client usage:

def my_visitor(x):   # client visitor functions (also often use lambdas)
    if x > 3:
        yield x / 2   #>>> WANT TO DO SOMETHING LIKE THIS <<<#

results = find_stuff(my_visitor)   # client usage

results should yield 4/2, 5/2, then 6/2... i.e. 2, 2, 3.

Second client usage:

def print_repr_visitor(x):
    print repr(x)

find_stuff(print_repr_visitor)     # alternative usage

should print 1 2 3 4 5 6 (separate lines) but yield nothing

But, the yield doesn't create a generator in "results" (at least with python 2.6.6 which I'm stuck with).


What I've tried

I've been hacking this up, often like this...

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        val = visitor(x)
        if val is not None:
             yield val

...or sometimes, when the list of visitor parameters is a pain to type out too many times...

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        val = visitor(x)
        if val == 'yield':
            yield x
        elif val is not None:
             yield val

The Issues / Question

These "solutions" are not only clumsy - needing explicit built-in support from the "find" routine - they remove sentinel values from the set of results the visitor can yield back to the top-level caller...

    Are there better alternatives in terms of concision, intuitiveness, flexibility, elegance etc?

Thanks!

Was it helpful?

Solution

In Python 3, you can use yield from to yield items from a subgenerator:

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        yield from visitor(x)

In Python 2, you have to loop over the subgenerator. This takes more code and doesn't handle a few edge cases, but it's usually good enough:

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        for item in visitor(x):
            yield item

The edge cases are things like trying to send values or throw exceptions into the subgenerator. If you're not using coroutine functionality, you probably don't need to worry about them.

OTHER TIPS

If understand right, perhaps you want something like this:

def find_stuff(visitor):
    for x in [1, 2, 3, 4, 5]:
        match, val = visitor(x)
        if match:
            yield val

def my_visitor(x):
    if x > 4:
        return True, x/2
    else:
        return False, None

That is, have the visitor return two things: the value to be yielded, if any, and a boolean indicating whether to yield the value. This way any value can be yielded.

The title of your question seems to suggest that you want my_visitor to somehow decide whether or not find_stuff yields a value on each iteration, but you don't actually describe this in the question. In any case, it isn't possible. A generator can call another function to decide what to yield, but there's no way for the called function to magically make its caller yield or not yield; that decision has to be made within the caller (find_stuff in this case).

From your question, though, I don't understand why this is a problem. You say that your proposed solutions are "clumsy - needing explicit built-in support from the "find" routine" but I don't see how that's clumsy. It's just an API. find_stuff obviously will have to have "built-in support" for doing what it's supposed to do, and the visitors will have to know what to return to communicate with the caller. You can't expect to be able to write a my_visitor function that works with any find routine anyone might come up with; the system as a whole will have to define an API that describes how to write a visitor that find_stuff can use. So you just need to come up with an API that visitors have to follow. My example above is one simple API, but it's hard to tell from your question what you're looking for.

I did find a solution for this with some investigation, and in python 2.6. It's a little weird, but it does appear to work.

from itertools import chain

def my_visitor(x):
    if x > 3:
        yield x / 2

def find_stuff(visitor):
    search_list = (1,2,3,4,5,6)
    return (x for x in chain.from_iterable(visitor(x) for x in search_list))

find_stuff(my_visitor)
<generator object <genexpr> at 0x0000000047825558>

list(find_stuff(my_visitor))
[0x2, 0x2, 0x3]

as expected. The generator is nice, as you can do things like this:

def my_visitor2(x):
    if x > 3:
        yield x / 2
    elif x > 1:
        yield x
        yield x*2
        yield x-3

In [83]: list(find_stuff(my_visitor2))
[0x2, 0x4, -0x1, 0x3, 0x6, 0x0, 0x2, 0x2, 0x3]

and have each visit return no values, a single values, or a bunch of values, and they'll all get into the result.

You could adapt this to scalar values though as well. Best way would be with a nested generator:

sentinel = object()

def my_scalar_visitor(x):
    if x > 3: 
        return x / 2
    else:
        return sentinel

def find_stuff_scalar(scalar_visitor):
    search_list=(1,2,3,4,5,6)
    return (x for x in (scalar_visitor(y) for y in search_list) if x != sentinel)

list(find_stuff_scalar(my_scalar_visitor))
[0x2, 0x2, 0x3]

user2357112's answer solves the problem given by the question, but it seems to me that the generator-within-a-generator approach is overcomplicated for this specific situation, and limits the client's options for using your code.

You want to traverse some structure, apply some function, and yield the results. Your code allows for this, but you are conflating two ideas that Python already has excellent, separate support for (traversing and mapping) with no extra benefits.

Your traversal function could simply traverse:

def traverse_stuff():
    for x in (1, 2, 3, 4, 5, 6):
        yield x

And when we want to consume, you or your client can use list comprehensions, combinators such as map and filter, or just simple for loops:

[x / 2 for x in traverse_stuff() if x > 3]

map(lambda x: x / 2, filter(lambda x: x > 3, traverse_stuff())

for value in traverse_stuff():
    print(value)

Splitting the code in this way makes it more composable (your client is not limited to the visitor pattern/generators), more intuitive for other Python developers, and more performant for cases where you only need to consume part of the structure (e.g., when you only need to find some n number of nodes from a tree, when you only want to find the first value in your structure that satisfies a condition, &c.).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top