The iterators that are yielded for each group from itertools.groupby
are not independent of the top-level iteration. You need to consume each one of them before you go on to the next group, or the iterator becomes invalid (it will yield nothing further).
This behavior is referenced in the docs:
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list
Your two list comprehensions show this. In the first one, you call list
on x[1]
, which is the iterator. In the second version, all of the iterators are produced first in the list
call around the groupby
call, and only when you iterate over that list do the inner iterators get consumed. Note that the iterator over last group ([9]
) does work!
Here's a simpler example:
groupby_iter = itertools.groupby([1,1,2,2])
first_val, first_group = next(groupby_iter)
# right now, we can iterate on `first_group`:
print(next(first_group)) # prints 1
# but if we advance groupby_iter to the next group...
second_val, second_group = next(groupby_iter)
# first_group is now invalid (it won't yield the second 1)
print(next(first_group)) # raises StopIteration