Вопрос

Can anyone please explain why iterating over a list that is produced from iterator X is producing a different result compared to iterating over iterator X?

In other words [x for x in list(IteratorObject)] != [x for x in IteratorObject]

>>> randoms = [random.randrange(10) for i in range(100)]

>>> [ (x[0],list(x[1])) for x in itertools.groupby(sorted(randoms))]
[(0, [0, 0, 0, 0, 0, 0, 0, 0]), (1, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), (2, [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]), (3, [3, 3, 3, 3, 3, 3]), (4, [4, 4, 4, 4, 4, 4, 4, 4, 4, 4]), (5, [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]), (6, [6, 6, 6, 6, 6, 6, 6, 6, 6]), (7, [7, 7, 7, 7, 7]), (8, [8, 8, 8, 8, 8, 8, 8]), (9, [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])]

>>> [ (x[0],list(x[1])) for x in list(itertools.groupby(sorted(randoms)))]
[(0, []), (1, []), (2, []), (3, []), (4, []), (5, []), (6, []), (7, []), (8, []), (9, [9])]

>>> sys.version
'3.3.3 (default, Dec  2 2013, 01:40:21) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]'
Это было полезно?

Решение 2

The iterators that are yielded for each group from itertools.groupby are not independent of the top-level iteration. You need to consume each one of them before you go on to the next group, or the iterator becomes invalid (it will yield nothing further).

This behavior is referenced in the docs:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list

Your two list comprehensions show this. In the first one, you call list on x[1], which is the iterator. In the second version, all of the iterators are produced first in the list call around the groupby call, and only when you iterate over that list do the inner iterators get consumed. Note that the iterator over last group ([9]) does work!

Here's a simpler example:

groupby_iter = itertools.groupby([1,1,2,2])
first_val, first_group = next(groupby_iter)

# right now, we can iterate on `first_group`:
print(next(first_group)) # prints 1

# but if we advance groupby_iter to the next group...
second_val, second_group = next(groupby_iter)

# first_group is now invalid (it won't yield the second 1)
print(next(first_group)) # raises StopIteration

Другие советы

I think this point in the documentation explains the problem:

"The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list"

In your second example, when you convert to a list, you immediately iterate through all of the groups. But within each group, you don't iterate through the underlying elements. When you finally try to do that with list(x[1]), it's too late - you've already exhausted the iterator.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top