Question

I have generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse generator several times.

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

Of course, I'm taking in mind copying content into simple list.

Was it helpful?

Solution

Another option is to use the itertools.tee() function to create a second version of your generator:

y = FunctionWithYield()
y, y_backup = tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

This could be beneficial from memory usage point of view if the original iteration might not process all the items.

OTHER TIPS

Generators can't be rewound. You have the following options:

  1. Run the generator function again, restarting the generation:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
    
  2. Store the generator results in a data structure on memory or disk which you can iterate over again:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)
    

The downside of option 1 is that it computes the values again. If that's CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.

So you have the classic memory vs. processing tradeoff. I can't imagine a way of rewinding the generator without either storing the values or calculating them again.

>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2

Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

This way, you can cache the expensive calculations.

If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.

I want to offer a different solution to an old problem

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)

If GrzegorzOledzki's answer won't suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.

UPDATE: Also see itertools.tee(). It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you're using the generator.

If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here's a sort snippet that might be handy:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

From official documentation of tee:

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

So it's best to use list(iterable) instead in your case.

You can define a function that returns your generator

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

Now you can just do as many times as you like:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

Using a wrapper function to handle StopIteration

You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).

And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item

I'm not sure what you meant by expensive preparation, but I guess you actually have

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

If that's the case, why not reuse data?

There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.

Creating iterator object with items 0 to 9

i=iter(range(10))

Iterating through next() function which will pop out

print(next(i))

Converting the iterator object to list

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

So you need to convert the iterator to lists for backup before start iterating. List could be converted to iterator with iter(<list-object>)

You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.

Install via > pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

Note: memory consumption grows while advancing the iterator, so be wary of large iterables.

Ok, you say you want to call a generator multiple times, but initialization is expensive... What about something like this?

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of 'reset' function.

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html

It can be done by code object. Here is the example.

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top