Pregunta

pandas.merge acts differently for the left and right sides!!! For the left side if we use left_on and left_index together it shows an error, but the same for the right side works!!!

Code:

import pandas as pd
import numpy as np
right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])
left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})
pd.merge(left,right,right_index=True,left_index=True,right_on='event1')#it works and returns an empty table which is expected
pd.merge(left,right,left_index=True,right_index=True,left_on='key1')# it makes error !!!
¿Fue útil?

Solución

You have a few issues going on. First your merge statements are not constructed correctly. You shouldn't be using both a left_on and left_index or right_on and right_index at the same time. You should use only one left option and one right option.

The reason you get an error in your second statement is because the index levels do not match. In your left merge, the left index is a single level, and you while you specify both right_index=True and right_on='event1', the right_on attribute is taking precedence. Since both are single level integers, there is no problem. I should point out that the merge, if constructed correctly, (pd.merge(left, right, left_index=True, right_on='event1', how='left')) does not produce an empty DataFrame... See code below.

In your right merge, you specify using the right index with right_index=True and left_on takes precedence over left_index=True. The issue here is that the right index is 2 levels, where as your 'key1` field is a single level string.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])

In [4]: left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})

In [5]: left
Out[5]:
   data    key1  key2
0     0    Ohio  2000
1     1    Ohio  2001
2     2    Ohio  2002
3     3  Nevada  2001
4     4  Nevada  2002

In [6]: right
Out[6]:
             event1  event2
Nevada 2001       0       1
       2000       2       3
Ohio   2000       4       5
       2000       6       7
       2001       8       9
       2002      10      11

In [5]: left_merge = left.merge(right, left_index=True, right_on='event1', how='left')

In [7]: left_merge
Out[7]:
             data    key1  key2  event1  event2
Nevada 2001     0    Ohio  2000       0       1
Ohio   2002     1    Ohio  2001       1     NaN
Nevada 2000     2    Ohio  2002       2       3
Ohio   2002     3  Nevada  2001       3     NaN
       2000     4  Nevada  2002       4       5
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top