Pregunta

my data looks like this:

 date, cola, colb, colc
 1,10,,
 2,11,,
 3,12,,
 4,13,,
 1,,14,
 2,,15,
 3,,16,
 4,,17,
 1,,,17
 2,,,18
 3,,,19
 4,13,,20

I'd like to merge the rows based on the first column and have the output look like this:

 date, cola, colb, colc
 1,10,14,17
 2,11,15,18
 3,12,16,19
 4,13,17,20

I can't guarantee there won't be any conflicts, so I'd like to be able to choose the max or mean.

No hay solución correcta

Otros consejos

You can use groupby. Start from a csv with duplicates:

>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18

Read it in:

>>> df = pd.read_csv("tomerge.csv")
>>> df
   date   cola   colb   colc
0     1     10    NaN    NaN
1     2     11    NaN    NaN
2     1    NaN     14    NaN
3     2    NaN     15    NaN
4     1    NaN     24    NaN
5     2    NaN     40    NaN
6     1    NaN    NaN     17
7     2    NaN    NaN     18

And then the magic happens:

>>> df.groupby("date").mean()
       cola   colb   colc
date                     
1        10   19.0     17
2        11   27.5     18
>>> df.groupby("date").max()
       cola   colb   colc
date                     
1        10     24     17
2        11     40     18
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top