Concatenating pandas dataframes along time series index without duplicating columns

https://stackoverflow.com/questions/23630380

21-07-2023
|

سؤال

I would like to concatenate 2 pandas DataFrames, each with time series indexes that may overlap, but also with column keys that may overlap.

For example:

    old_close                                   new_close
             1TM    ABL  ...                    ABL    ANG    ...
Date                                Date
2009-06-05  100     564             1990-06-08  120    2533   
2009-06-04  102     585             1990-06-05  121    2531
2009-06-03  101     532             1990-06-04  123    2520
2009-06-02  99      540             1990-06-03  122    2519
2009-06-01  99      542             1990-06-02  121    2521
...

I want to merge old_close and new_close to form a new DataFrame that includes all the data in both the DataFrames but excludes all duplicate values on both indices.

So far I do this:

merged_close = pd.concat([old_close, new_close], axis=1)

but this results in duplicate columns (rows when along axis 0) and a MultiIndex.

المحلول

Assuming, you want to 'exclude all duplicate values on both indices', this should work

unique_indices = np.setdiff1d(np.unioin1d(old_close.index.to_list(), new_close.index.to_list()), 
                              np.intersect1d(old_close.index.to_list(), new_close.index.to_list()))
merged_close = pd.concat([old_close, new_close]).ix[unique_indices]

EDIT: Updated unique indices calculation. All duplicate indices are dropped now

نصائح أخرى

From Pandas documentation:

concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
       keys=None, levels=None, names=None, verify_integrity=False)

verify_integrity: boolean, default False. Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation

Have you tried setting that parameter to True?

EDIT:

I'm sorry, verify_integrity just raises an error if there are duplicates. Anyway you can try taking a look at the drop_duplicates() function.

PS: also take a look at this question:

python pandas remove duplicate columns

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow