Can a Python function take a generator and return generators to subsets of its generated output?
Question
Let's say I have a generator function like this:
import random
def big_gen():
i = 0
group = 'a'
while group != 'd':
i += 1
yield (group, i)
if random.random() < 0.20:
group = chr(ord(group) + 1)
Example output might be: ('a', 1), ('a', 2), ('a', 3), ('a', 4), ('a', 5), ('a', 6), ('a', 7), ('a', 8), ('b', 9), ('c', 10), ('c', 11), ('c', 12), ('c', 13)
I would like to break this into three groups: Group A, Group B, and Group C. And I would like a generator for each group. Then I'd pass the generator and the group letter into a subfunction. An example of the subfunction:
def printer(group_letter, generator):
print "These numbers are in group %s:" % group_letter
for num in generator:
print "\t%s" % num
The desired output would be:
These numbers are in group a:
1
2
3
4
5
6
7
8
These numbers are in group b:
9
These numbers are in group c:
10
11
12
13
How can I do this without changing big_gen() or printer(), and avoid storing the entire group in memory at once? (In real life, the groups are huge)
Solution
Sure, this does what you want:
import itertools
import operator
def main():
for let, gen in itertools.groupby(big_gen(), key=operator.itemgetter(0)):
secgen = itertools.imap(operator.itemgetter(1), gen)
printer(let, secgen)
groupby
does the bulk of the work here -- the key=
just tells it what field to group by.
The resulting generator needs to be wrapped in an imap
just because you've specified your printer
signature to take an iterator over number, while, by nature, groupby
returns iterators over the same items it gets as its input -- here, 2-items tuples with a letter followed by a number -- but this is not really all that germane to your question's title.
The answer to that title is that, yep, a Python function can perfectly well do the job you want -- itertools.groupby
in fact does exactly that. I recommend studying the itertools module carefully, it's a very useful tool (and delivers splendid performance as well).
OTHER TIPS
You have a slight problem here. You'd like the function to printer() to take a generator for each group, but in reality you have the same generator yielding all groups. You have two options, as I see it:
1) Change big_gen() to yield generators:
import random
def big_gen():
i = 0
group = 'a'
while group != 'd':
def gen():
i += 1
yield i
if random.random() < 0.20:
group = chr(ord(group) + 1)
yield group, gen
from itertools import imap
imap(lambda a: printer(*a), big_gen())
2) Change printer() to keep state and notice when the group changes (keeping your original big_gen() function):
def printer(generator):
group = None
for grp, num in generator:
if grp != group:
print "These numbers are in group %s:" % grp
group = grp
print "\t%s" % num