Question

I have a DataFrame like this:

df:

 fruit    val1 val2
0 orange    15    3
1 apple     10   13
2 mango     5    5 

How do I get Pandas to give me a cumulative sum and percentage column on only val1?

Desired output:

df_with_cumsum:

 fruit    val1 val2   cum_sum    cum_perc
0 orange    15    3    15          50.00
1 apple     10   13    25          83.33
2 mango     5    5     30          100.00

I tried df.cumsum(), but it's giving me this error:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Was it helpful?

Solution

df['cum_sum'] = df['val1'].cumsum()
df['cum_perc'] = 100*df['cum_sum']/df['val1'].sum()

This will add the columns to df. If you want a copy, copy df first and then do these operations on the copy.

OTHER TIPS

It's a good answer, but written in 2014. I just modified a little bit, so it can pass the compiler and results looks similar to the example.

df['cum_sum'] = df["val1"].cumsum()
df['cum_perc'] = round(100*df.cum_sum/df["val1"].sum(),2)

The above answer is good, but out of date. I have updated it so that it works.

df['cum_sum'] = df['val1'].cumsum()

df['cum_perc'] = round((df.cum_sum/df['val1'].sum())*100,2)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top