Calculating the duration an event in a time series data frame (python 2.7)

Question 1

One way would be to use groupby and transform. max - min is also called peak-to-peak, or ptp for short, and so ptp here basically means for lambda x: x.max() - x.min().

>>> df = pd.read_csv("eye.csv",sep="\s+")
>>> df["duration"] = df.dropna().groupby("event")["time"].transform("ptp")
>>> df
     time  event  duration
49  44295    NaN       NaN
50  44311    NaN       NaN
51  44328    NaN       NaN
52  44345      2        66
53  44361      2        66
54  44378      2        66
55  44395      2        66
56  44411      2        66
57  44428      3        50
58  44445      3        50
59  44461      3        50
60  44478      3        50
61  44495    NaN       NaN
62  44511    NaN       NaN
63  44528    NaN       NaN
64  44544    NaN       NaN
65  44561    NaN       NaN
66  44578    NaN       NaN
67  44594    NaN       NaN
68  44611      4        33
69  44628      4        33
70  44644      4        33
71  44661    NaN       NaN
72  44678    NaN       NaN

The dropna was to prevent each NaN value in the event column from being considered its own event. (There's also something weird going on in how ptp works when the key is NaN too, but that's a separate issue.)

Question 2

Iterate over records using groupby from itertools. Group criteria shall be the event number. As you have the data properly ordered (all event codes related to the same event are not interrupted by others), there is no need to do sorting on even code.

groupby will iteratively return tuples (key, group), where key is the even code and group is list of all the records.

From the records, pick up minimal and maximal time and calculate duration.

Then, do your work to get durations as new field to your records.

There might be more efficient methods using pandas, which I am not aware of. Described solution does not require pandas.

Question 3

I ended up doing the following work around to the posted answer by @DSM:

df["dur"] = datalist[i][j].groupby("event")["time"].transform("ptp")
dur = []
for i in datalist.index:
    if np.isnan(df["event"][i]): 
        dur.append(df["event"][i])
    else:
        dur.append(df["dur"][i])
df["Duration"] = dur

This at least works for me.