You can use groupby
. Start from a csv
with duplicates:
>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18
Read it in:
>>> df = pd.read_csv("tomerge.csv")
>>> df
date cola colb colc
0 1 10 NaN NaN
1 2 11 NaN NaN
2 1 NaN 14 NaN
3 2 NaN 15 NaN
4 1 NaN 24 NaN
5 2 NaN 40 NaN
6 1 NaN NaN 17
7 2 NaN NaN 18
And then the magic happens:
>>> df.groupby("date").mean()
cola colb colc
date
1 10 19.0 17
2 11 27.5 18
>>> df.groupby("date").max()
cola colb colc
date
1 10 24 17
2 11 40 18