Question

I have an iterable -- that is, something which responds to __iter__ and which can be iterated over lazily, multiple times using a new fresh iterator each time.

I want to map this to another iterable, which can also be iterated over multiple times, but without bringing the whole collection into memory.

map doesn't work -- it returns a list, so brings the whole dataset into memory.

itertools.imap also doesn't work -- it takes an iterable but returns a one-shot iterator.

What I'm looking for is a set of itertools-like combinators which operate at the level of iterables. Only at the final stage, when I'm consuming the end result, do I want a single-shot iterator object, so I don't really understand why itertools returns them rather than returning e.g. some kind of MappedIterable.

Pointers anyone? Or is this somehow heretically non-Pythonic?

Was it helpful?

Solution

itertools is reasonably simple, it mostly (entirely?) doesn't do different things depending on the iterable/multiply-iterable/sequence-ness of its input. imap doesn't know or care that you've passed it an iterable that happens not to be an iterator.

class MyMap(object):
    def __init__(self, func, *iterables):
        self.func = func
        self.iterables = iterables
    def __iter__(self):
        return iter(itertools.imap(self.func, *self.iterables))

Or something along those lines. I haven't tested it.

It's difficult (impossible?) to do this automagically, since the Python iterator protocol doesn't tell you whether or not an iterable can be iterated more than once. You can assume that if iter(i) is i then it can't, but I don't think you can safely assume that if iter(i) is not i then it can.

Basically an iterable that can be iterated multiple times (analogous to what C++ calls a ForwardIterator as opposed to a mere InputIterator) is not a concept commonly demanded by Python programmers AFAIK. So I think you might have to write your own wrapper for itertools.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top