Question

My data looks like this (ch = channel, det = detector):

ch det time counts 
1   1    0    123
    2    0    121
    3    0    125 
2   1    0    212
    2    0    210
    3    0    210 
1   1    1    124
    2    1    125
    3    1    123 
2   1    1    210
    2    1    209
    3    1    213

Note, in reality, the time column is a float with 12 or so significant digits, still constant for all detectors of 1 measurement, but its value is not predictable, nor in a sequence.

What I need to create is a data frame that looks like this:

c  time  mean_counts_over_detectors
1   0       xxx
2   0       yyy
1   1       zzz
1   1       www

I.e., I would like to apply np.mean over all counts of the detectors of 1 channel at each time separately. I could write kludgy loops, but I feel that pandas must have something built-in for this. I am still a beginner at pandas, and especially with MultiIndex there are so many concepts, I am not sure what I should be looking for in the docs.

The title contains 'condition' because I thought that maybe the fact that I want the mean over all detectors of one channel for the counts where the time is the same can be expressed as a slicing condition.

Was it helpful?

Solution

Same as @meteore but with a MultiIndex.

In [55]: df
Out[55]:
             counts
ch det time
1  1   0        123
   2   0        121
   3   0        125
2  1   0        212
   2   0        210
   3   0        210
1  1   1        124
   2   1        125
   3   1        123
2  1   1        210
   2   1        209
   3   1        213

In [56]: df.index
Out[56]:
MultiIndex
[(1L, 1L, 0L) (1L, 2L, 0L) (1L, 3L, 0L) (2L, 1L, 0L) (2L, 2L, 0L)
 (2L, 3L, 0L) (1L, 1L, 1L) (1L, 2L, 1L) (1L, 3L, 1L) (2L, 1L, 1L)
 (2L, 2L, 1L) (2L, 3L, 1L)]

In [57]: df.index.names
Out[57]: ['ch', 'det', 'time']

In [58]: df.groupby(level=['ch', 'time']).mean()
Out[58]:
             counts
ch time
1  0     123.000000
   1     124.000000
2  0     210.666667
   1     210.666667

Be carefull with floats & groupby (this is independent of a MultiIndex or not), groups can differ due to numerical representation/accuracy-limitations related to floats.

OTHER TIPS

Not using MultiIndexes (if you have them, you can get rid of them through df.reset_index()):

chans = [1,1,1,2,2,2,1,1,1,2,2,2]
df = pd.DataFrame(dict(ch=chans, det=[1,2,3,1,2,3,1,2,3,1,2,3], time=6*[0]+6*[1], counts=np.random.randint(0,500,12)))

Use groupby and mean as an aggregation function:

>>> df.groupby(['time', 'ch'])['counts'].mean()
time  ch
0     1     315.000000
      2     296.666667
1     1     178.333333
      2     221.666667
Name: counts

Other aggregation functions can be passed via agg:

>>> df.groupby(['time', 'ch'])['counts'].agg(np.ptp)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top