groupby() giving an empty list [duplicate]

Question 1

d = [(key, list(group)) for key, group in groupby(dt, lambda x: x[0])]
prnt(d)

groupby will return key and a generator for the group, for each group it finds.

Output

[('23271800', [('23271800', 0.0006679078063627531)]),
 ('23271812', [('23271812', 0.0010018617095441298)]),
 ('26112103', [('26112103', 0.0006679078063627531)]),
 ('27111616', [('27111616', 0.005677216354083401)]),
 ('30101600',
  [('30101600', 1.3909064158636346e-05), ('30101600', 0.002002905238843634)]),
 ('30102200', [('30102200', 0.00013358156127255062)]),
 ('31100000', [('31100000', 2.1849453575689805e-05)]),
 ('31161500', [('31161500', 0.0005180729752775727)]),
 ('31161501', [('31161501', 0.00012902764441098641)]),
 ('31161505', [('31161505', 0.013866049271881438)]),
 ('31161513', [('31161513', 0.021559049445886335)]),
 ('31161518', [('31161518', 0.0011596016382808651)]),
 ('31161520', [('31161520', 0.022263593545425106)]),
 ('31161600', [('31161600', 0.003930380552826971)]),
 ('31161618', [('31161618', 0.0016029787352706075)]),
 ('31161620', [('31161620', 0.0008462931211056002)]),
 ('31161700', [('31161700', 0.0008833842874611101)]),
 ('31161716', [('31161716', 7.067074299688881e-05)]),
 ('31161717', [('31161717', 0.0014193040885208503)]),
 ('31161727', [('31161727', 0.01364664212812536)]),
 ('31161801', [('31161801', 0.000179280516444739)]),
 ('31161900',
  [('31161900', 1.6624352427769844e-05), ('31161900', 0.0001496191718499286)]),
 ('31161904', [('31161904', 6.666007460763289e-05)]),
 ('31162409', [('31162409', 0.007129527514430318)]),
 ('31162800',
  [('31162800', 0.0002625302360269781),
   ('31162800', 0.359403893120933),
   ('31162800', 0.2207879284986886),
   ('31162800', 0.0002625302360269781)]),
 ('31163200',
  [('31163200', 0.00037295581888139136),
   ('31163200', 4.1439535431265705e-05)]),
 ('31163201', [('31163201', 0.011292216638533014)]),
 ('31163202',
  [('31163202', 4.5417730832667214e-05),
   ('31163202', 4.5417730832667214e-05)]),
 ('31163203', [('31163203', 0.003471418917146539)]),
 ('31163204', [('31163204', 0.0002962025923869601)]),
 ('31163214', [('31163214', 0.0014119501813264418)]),
 ('31163215', [('31163215', 0.017772155543217604)]),
 ('31171504', [('31171504', 0.05423235622453355)]),
 ('31181600', [('31181600', 5.262772981769086e-05)]),
 ('31181602', [('31181602', 0.00019920057382748777)]),
 ('31191518', [('31191518', 0.0014972878296483697)]),
 ('39121719', [('39121719', 0.0022708865416333607)]),
 ('40141600', [('40141600', 5.0614113112184855e-05)]),
 ('40141607', [('40141607', 0.0958030259751574)]),
 ('40141616',
  [('40141616', 0.005499007768977646), ('40141616', 0.00015275021580493458)]),
 ('40141636', [('40141636', 0.0007247510239255406)]),
 ('40141680', [('40141680', 0.12267031972561518)]),
 ('40142000', [('40142000', 0.0002962025923869601)]),
 ('40142100', [('40142100', 8.188292818389522e-05)]),
 ('40142315', [('40142315', 0.00034758467473980007)]),
 ('40142323', [('40142323', 0.0006308018171203779)]),
 ('40161500', [('40161500', 0.0004007446838176519)])]

Question 2

The iterator returned by itertools.groupby() is slightly tricky to use. As it says in the documentation:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible.

What this means is that you have to process each group as it is generated by the groupby() object. If you look at the group later, you'll find that it's contents have been skipped. For example:

>>> from itertools import groupby
>>> groups = list(groupby('AAABBBCCC'))
>>> groups
[('A', <itertools._grouper object at 0x107155490>),
 ('B', <itertools._grouper object at 0x1071553d0>),
 ('C', <itertools._grouper object at 0x107155d50>)]
>>> list(groups[0][1])
[]

The documentation says:

So, if that data is needed later, it should be stored as a list.

For example:

>>> groups = [(key, list(group)) for key, group in groupby('AAABBBCCC')]
>>> groups[0][1]
['A', 'A', 'A']

But it is usually better to try to re-organize your code so that you can process each group in turn, without storing it in a list. For example, like this:

for key, group in groupby('AAABBBCCC'):
    for item in group:
        # do something with item

Question 3

It sounds like you want a non-streaming groupby operation. The implementation in itertools is probably overkill for you application. You could try the implementation in toolz.