Question

Let's say I have this data frame:

df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})

and this is what I'm trying to get:

col1    col2
l   n   l   n
12  0   32  1
16  1   47  1
92  0   22  0
77  0   14  1

I've been playing around with set_index and stack/unstack methods but with no success...

Was it helpful?

Solution

import pandas as pd

df = pd.DataFrame(
    {'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14],
     'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})

df['index'] = df.groupby(['cols']).cumcount()
result = df.pivot(index='index', columns='cols')
print(result)
#           l           n      
# cols   col1  col2  col1  col2
# index                        
# 0        12    32     0     1
# 1        16    47     1     1
# 2        92    22     0     0
# 3        77    14     0     1

If you care about the order of the labels in the MultiIndex column, you could use stack and unstack to exactly reproduce result you posted:

result = result.stack(level=0).unstack(level=1)
print(result)

# cols   col1     col2   
#           l  n     l  n
# index                  
# 0        12  0    32  1
# 1        16  1    47  1
# 2        92  0    22  0
# 3        77  0    14  1

When looking for a solution it is often useful to think backwards.

Start with the desired DataFrame and ask yourself what operation might result in the desired DataFrame. In this case, the operation that came to mind was pd.pivot. Then the question becomes, what DataFrame, something, is needed so that

desired = something.pivot(index='index', columns='cols') 

By looking at other examples of pivot in action, it became clear than something had to equal

   cols   l  n  index
0  col1  12  0      0
1  col1  16  1      1
2  col1  92  0      2
3  col1  77  0      3
4  col2  32  1      0
5  col2  47  1      1
6  col2  22  0      2
7  col2  14  1      3

Then you see if you can find a way to massage df into something, or again working backwards, massage something into df... From this point of view, in this case, the missing link became apparent: something has an index column that df lacked.

OTHER TIPS

You can use a combination of DataFrame.groupby, DataFrame.reset_index and DataFrame.T (transpose)

import pandas as pd

df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0, 1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
print df.groupby('cols').apply(lambda x: x.reset_index(drop=True).drop('cols',axis=1).T).T

Output:

cols  col1     col2   
         l  n     l  n
0       12  0    32  1
1       16  1    47  1
2       92  0    22  0
3       77  0    14  1

Or you can use concat:

print pd.concat([g.drop('cols',axis=1).reset_index(drop=True) for _,g in df.groupby('cols')],axis=1,keys=df['cols'].unique())

Output:

   col1     col2   
      l  n     l  n
0    12  0    32  1
1    16  1    47  1
2    92  0    22  0
3    77  0    14  1

Hope it helps, :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top