Pregunta

I have a list comprehension which approximates to:

[f(x) for x in l if f(x)]

Where l is a list and f(x) is an expensive function which returns a list.

I want to avoid evaluating f(x) twice for every non-empty occurance of f(x). Is there some way to save its output within the list comprehension?

I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.

Edit:

Two basic approaches have been suggested:

An inner generator comprehension:

[y for y in (f(x) for x in l) if y]

or memoization.

I think the inner generator comprehension is elegant for the problem as stated. In actual fact I simplified the question to make it clear, I really want:

[g(x, f(x)) for x in l if f(x)]

For this more complicated situation, I think memoization produces a cleaner end result.

¿Fue útil?

Solución 2

A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.

a really simple implementation is the following:

storage = {}
def memoized(value):
    if value not in storage:
        storage[value] = f(value)
    return storage[value]

[memoized(x) for x in l if memoized(x)]

and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.

You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.

On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.

EDIT:

as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:

[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]

You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.

Otros consejos

[y for y in (f(x) for x in l) if y]

Will do.

You should use a memoize decorator. Here is an interesting link.


Using memoization from the link and your 'code':

def memoize(f):
    """ Memoization decorator for functions taking one or more arguments. """
    class memodict(dict):
        def __init__(self, f):
            self.f = f
        def __call__(self, *args):
            return self[args]
        def __missing__(self, key):
            ret = self[key] = self.f(*key)
            return ret
    return memodict(f)

@memoize
def f(x):
    # your code

[f(x) for x in l if f(x)]

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling twice the same function:

In our case, we can name the evaluation of f(x) as a variable y while using the result of the expression to filter the list but also as the mapped value:

[y for x in l if (y := f(x))]
[y for y in [f(x) for x in l] if y]

For your updated problem, this might be useful:

[g(x,y) for x in l for y in [f(x)] if y]

Nope. There's no (clean) way to do this. There's nothing wrong with a good-old-fashioned loop:

output = []
for x in l:
    result = f(x)
    if result: 
        output.append(result)

If you find that hard to read, you can always wrap it in a function.

As the previous answers have shown, you can use a double comprehension or use memoization. For reasonably-sized problems it's a matter of taste (and I agree that memoization looks cleaner, since it hides the optimization). But if you're examining a very large list, there's a huge difference: Memoization will store every single value you've calculated, and can quickly blow out your memory. A double comprehension with a generator (round parens, not square brackets) only stores what you want to keep.

To come to your actual problem:

[g(x, f(x)) for x in series if f(x)]

To calculate the final value you need both x and f(x). No problem, pass them both like this:

[g(x, y) for (x, y) in ( (x, f(x)) for x in series ) if y ]

Again: this should be using a generator (round parens), not a list comprehension (square brackets). Otherwise you will build the whole list before you start filtering the results. This is the list comprehension version:

[g(x, y) for (x, y) in [ (x, f(x)) for x in series ] if y ] # DO NOT USE THIS

There have been a lot of answers regarding memoizing. The Python 3 standard library now has a lru_cache, which is a Last Recently Used Cache. So you can:

from functools import lru_cache

@lru_cache()
def f(x):
    # function body here

This way your function will only be called once. You can also specify the size of the lru_cache, by default this is 128. The problem with the memoize decorators shown above is that the size of the lists can grow well out of hand.

You can use memoization. It is a technique which is used in order to avoid doing the same computation twice by saving somewhere the result for each calculated value. I saw that there is already an answer that uses memoization, but I would like to propose a generic implementation, using python decorators:

def memoize(func):
    def wrapper(*args):
        if args in wrapper.d:
            return wrapper.d[args]
        ret_val = func(*args)
        wrapper.d[args] = ret_val
        return ret_val
    wrapper.d = {}
    return wrapper

@memoize
def f(x):
...

Now f is a memoized version of itself. With this implementation you can memoize any function using the @memoize decorator.

Use map() !!

comp = [x for x in map(f, l) if x]

f is the function f(X), l is the list

map() will return the result of f(x) for each x in the list.

Here is my solution:

filter(None, [f(x) for x in l])

How about defining:

def truths(L):
    """Return the elements of L that test true"""
    return [x for x in L if x]

So that, for example

> [wife.children for wife in henry8.wives]
[[Mary1], [Elizabeth1], [Edward6], [], [], []]

> truths(wife.children for wife in henry8.wives) 
[[Mary1], [Elizabeth1], [Edward6]]
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top