How to assign one row of a hierarchically indexed Pandas DataFrame to another row?

StackOverflow https://stackoverflow.com/questions/23686522

  •  23-07-2023
  •  | 
  •  

Frage

I'm trying to assign one row of a hierarchically indexed Pandas DataFrame to another row of the DataFrame. What follows is a minimal example.

import numpy as np    
import pandas as pd

columns = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)])
data = pd.DataFrame(np.random.randn(3, 4), columns=columns)

print(data)
data.loc[0, 'a'] = data.loc[1, 'b']
print(data)

This fills row 0 with NaNs instead of the values from row 1. I noticed I can get around it by converting to an ndarray before assignment:

data.loc[0, 'a'] = np.array(data.loc[1, 'b'])

Presumably there's a reason for this behavior, and an idiomatic way to make the assignment?

Edit: modified the question after Jeff's answer made me realize I oversimplified the problem.

War es hilfreich?

Lösung

In [38]: data = pd.DataFrame(np.random.randn(3, 2), columns=columns)

In [39]: data
Out[39]: 
          a          
          0         1
0  1.657540 -1.086500
1  0.700830  1.688279
2 -0.912225 -0.199431

In [40]: data.loc[0,'a']
Out[40]: 
0    1.65754
1   -1.08650
Name: 0, dtype: float64

In [41]: data.loc[1,'a']
Out[41]: 
0    0.700830
1    1.688279
Name: 1, dtype: float64

In your example notice that the index of the assigned element are [0,1]; These don't match the columns which are ('a',0),('a',1). So you end up effectively reindexing to elements which don't exist and hence you get nan.

In general its better to let pandas 'figure' out the rhs alignment (and like you are doing here, mask the lhs).

In [42]: data.loc[0,'a'] = data.loc[1,:]

In [43]: data
Out[43]: 
          a          
          0         1
0  0.700830  1.688279
1  0.700830  1.688279
2 -0.912225 -0.199431

You also could do

data.loc[0] = data.loc[1]

Here's another way:

In [96]: data = pd.DataFrame(np.arange(12).reshape(3,4), columns=pd.MultiIndex.from_product([['a','b'],[0,1]]))

In [97]: data
Out[97]: 
   a      b    
   0  1   0   1
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

In [98]: data.loc[0,'a'] = data.loc[1,'b'].values

In [99]: data
Out[99]: 
   a      b    
   0  1   0   1
0  6  7   2   3
1  4  5   6   7
2  8  9  10  11

Pandas will always align the data, that's why this doesn't work naturally. You are deliberately NOT aligning.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top