質問

my data looks like this:

 date, cola, colb, colc
 1,10,,
 2,11,,
 3,12,,
 4,13,,
 1,,14,
 2,,15,
 3,,16,
 4,,17,
 1,,,17
 2,,,18
 3,,,19
 4,13,,20

I'd like to merge the rows based on the first column and have the output look like this:

 date, cola, colb, colc
 1,10,14,17
 2,11,15,18
 3,12,16,19
 4,13,17,20

I can't guarantee there won't be any conflicts, so I'd like to be able to choose the max or mean.

正しい解決策はありません

他のヒント

You can use groupby. Start from a csv with duplicates:

>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18

Read it in:

>>> df = pd.read_csv("tomerge.csv")
>>> df
   date   cola   colb   colc
0     1     10    NaN    NaN
1     2     11    NaN    NaN
2     1    NaN     14    NaN
3     2    NaN     15    NaN
4     1    NaN     24    NaN
5     2    NaN     40    NaN
6     1    NaN    NaN     17
7     2    NaN    NaN     18

And then the magic happens:

>>> df.groupby("date").mean()
       cola   colb   colc
date                     
1        10   19.0     17
2        11   27.5     18
>>> df.groupby("date").max()
       cola   colb   colc
date                     
1        10     24     17
2        11     40     18
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top