Question

I have a dataframe that has months for columns, and various departments for rows.

                2013April  2013May  2013June
        Dep1        0         10        15
        Dep2        10        15        20

I'm looking to add a column that counts the number of months that have a value greater than 0. Ex:

                2013April  2013May  2013June  Count>0 
        Dep1        0         10        15       2
        Dep2        10        15        20       3

The number of columns this function needs to span is variable. I think defining a function then using .apply is the solution, but I can't seem to figure it out.

Was it helpful?

Solution

first, pick your columns, cols

df[cols].apply(lambda s: (s > 0).sum(), axis=1)

this takes advantage of the fact that True and False are 1 and 0 respectively in python.

actually, there's a better way:

(df[cols] > 0).sum(1)

because this takes advantage of numpy vectorization

%timeit df.apply(lambda s: (s > 0).sum(), axis=1)
10 loops, best of 3: 141 ms per loop

%timeit (df > 0).sum(1)
1000 loops, best of 3: 319 µs per loop
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top