How do I get a Pandas TimeSeries for user sessions (using Pandas or Numpy)

Question 1

I found an approach that seems to work well:

Assuming we can transform our Login/Logout data into two DataFrames indexed by time:

Login    UserLogin
-------- ---------
8:58AM   User_2    
9:23AM   User_3    
10:25AM  User_1    
3:10PM   User_3    

Logout   UserLogout
-------- ----------
1:35PM   User_3
4:49PM   User_3
5:12PM   User_2
6:01PM   User_1

Then we can add an additional column to each table: 1 for the logins, and -1 for the logouts:

login['AvailabilityDelta'] = 1
logout['AvailabilityDelta'] = -1

Then we can perform an outer join on the two tables, and fill the NA values the join created with 0s:

events = login.join(logout, how='outer')
events.fillna(value=0, inplace=True)

On the newly joined "Events" DataFrame, we then create an "AvailabilityDelta" column that is the sum of the "Login" and "Logout" columns (from the login and logout DataFrames +1s and -1s we added above):

events['AvailabilityDelta'] = events.Login + events.Logout

Finally we can create an "Availability" column by performing a cumulative sum on the "AvailabilityDelta" column. This gives us the "Num Logged In" data that we were after in the original question:

events['Availability'] = events.AvailabilityDelta.cumsum()

At this point it is simple to add in additional information or create TimeSeries data, e.g.:

ts = events.resample('1H', how='mean', fill_method='ffill')

Question 2

Look at Arrow Module - it provides very universal DateTime objects with high level methods.

Ranges & spans

Get the timespan of any unit:

>>> arrow.utcnow().span('hour')
(<Arrow [2013-05-07T05:00:00+00:00]>, <Arrow [2013-05-07T05:59:59.999999+00:00]>)

Or just get the floor and ceiling:

>>> arrow.utcnow().floor('hour')
<Arrow [2013-05-07T05:00:00+00:00]>

>>> arrow.utcnow().ceil('hour')
<Arrow [2013-05-07T05:59:59.999999+00:00]>

Question 3

Your best bet would be to convert the times using something like strptime:

import time
t = time.strptime("5:24pm", "%H:%M%p")
>>> t.tm_hour
5
>>> t.tm_min
24

That way you can get everything in the same hour, for example, like you wanted.