Combining data with overlap

https://stackoverflow.com/questions/21836580

12-10-2022
|

Question

I have two DataFrames:

data = {'First': ['Tom', 'Peter', 'Phil'], 'Last': ['Dwan', 'Laak', 'Ivey'], 
        'Score': [101.5, 99, 105]}
df = pd.DataFrame(data, index=list('abc'))
print df 

   First  Last  Score
a    Tom  Dwan  101.5
b  Peter  Laak   99.0
c   Phil  Ivey  105.0


data2 = {'First': ['Tom', 'Phil'], 'Last': ['Dwan', 'Ivey'], 'Score': [103.5, 101]}
df2 = pd.DataFrame(data2, index=list('fg'))
print df2 

  First  Last  Score
f   Tom  Dwan  103.5
g  Phil  Ivey  101.0

I want to combine them where they overlap, for the net result:

   First  Last  Score  Score_new
a    Tom  Dwan  101.5      103.5
b  Peter  Laak   99.0        NaN
c   Phil  Ivey  105.0      101.0

Since indexes won't match it must join on First and Last columns. Suggestions please?

Solution

If you don't care about preserving the indices, you could do something like

>>> df.merge(df2, on=["First", "Last"], how='outer', suffixes=('', '_new'))
   First  Last  Score  Score_new
0    Tom  Dwan  101.5      103.5
1  Peter  Laak   99.0        NaN
2   Phil  Ivey  105.0      101.0

[3 rows x 4 columns]

If you do, maybe you could play around with left/right_index, something like

>>> df.merge(df2, on=["First", "Last"], how='outer', suffixes=('', '_new'), right_index=True)
   First  Last  Score  Score_new
a    Tom  Dwan  101.5      103.5
b  Peter  Laak   99.0        NaN
c   Phil  Ivey  105.0      101.0

[3 rows x 4 columns]

but I don't know why those letters would be important.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow