Domanda

my data looks like this:

 date, cola, colb, colc
 1,10,,
 2,11,,
 3,12,,
 4,13,,
 1,,14,
 2,,15,
 3,,16,
 4,,17,
 1,,,17
 2,,,18
 3,,,19
 4,13,,20

I'd like to merge the rows based on the first column and have the output look like this:

 date, cola, colb, colc
 1,10,14,17
 2,11,15,18
 3,12,16,19
 4,13,17,20

I can't guarantee there won't be any conflicts, so I'd like to be able to choose the max or mean.

Nessuna soluzione corretta

Altri suggerimenti

You can use groupby. Start from a csv with duplicates:

>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18

Read it in:

>>> df = pd.read_csv("tomerge.csv")
>>> df
   date   cola   colb   colc
0     1     10    NaN    NaN
1     2     11    NaN    NaN
2     1    NaN     14    NaN
3     2    NaN     15    NaN
4     1    NaN     24    NaN
5     2    NaN     40    NaN
6     1    NaN    NaN     17
7     2    NaN    NaN     18

And then the magic happens:

>>> df.groupby("date").mean()
       cola   colb   colc
date                     
1        10   19.0     17
2        11   27.5     18
>>> df.groupby("date").max()
       cola   colb   colc
date                     
1        10     24     17
2        11     40     18
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top