Cannot assign value for column in pandas DataFrame using another column's value as Series' key

StackOverflow https://stackoverflow.com/questions/21395677

문제

Consider a trivial example with a Dataframe df and a Series s

import pandas as pd

matching_vals = range(20,30)

df = pd.DataFrame(columns=['a'], index=range(0,10))
df['a'] = matching_vals
s  = pd.Series(list("ABCDEFGHIJ"), index=matching_vals)

df['b'] = s[df['a']]

At this point I would expect df['b'] to contain the letters A through J, but instead it's all NaN. However, if I replace the last line with

n = df['a'][2]
df['c'] = s[n]

then df['c'] is filled with Cs, as I'd expect, so I'm pretty sure it's not some strange type error.

I'm new to pandas, and this is driving me crazy.

도움이 되었습니까?

해결책

s[df['a']] has an index which is different than df's index:

In [104]: s[df['a']]
Out[104]: 
a
20    A
21    B
22    C
23    D
24    E
25    F
26    G
27    H
28    I
29    J

When you assign a Series to a column of a DataFrame, Pandas tries to assign values according to the index. Since s[df['a']] does not have any values associated with the indices of df, NaN values are assigned. The assignment does not add new rows to df.

If you don't want the index to enter into the assignment, you could use

df['b'] = s[df['a']].values

For a demonstration of the matching of indices, notice how

import pandas as pd

df = pd.DataFrame(columns=['a'], index=range(0,10))
df['a'] = range(0,10)[::-1]
s  = pd.Series(list("ABCDEFGHIJ"), index=range(0,10)[::-1])
df['b'] = s[df['a']]

yields

In [123]: s[df['a']]
Out[123]: 
a
9    A
8    B
7    C
6    D
5    E
4    F
3    G
2    H
1    I
0    J
dtype: object

In [124]: df
Out[124]: 
   a  b
0  9  J
1  8  I
2  7  H
3  6  G
4  5  F
5  4  E
6  3  D
7  2  C
8  1  B
9  0  A

[10 rows x 2 columns]

The values of df['b'] are "flipped" to make the indices match.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top