Pregunta

I have data which includes Date as well as Time enter and Time exit. These latter two contain data like this: 08:02, 12:02, 23:45 etc.

I would like to manipulate the Time eXXX data - for example, substract Time enter from Time exit to work out duration, or plot the distributions of Time enter and Time exit, e.g. to see if most entries are before 10:00, or if most exits are after 17:00.

All the packages I've looked at require a date to precede the time, e.g. 01/02/2012 12:33.

Is this possible, or should I simply append an identical date to every time for the sake of calculations? This seem a bit messy!

¿Fue útil?

Solución

Use the "times" class found in the chron package:

library(chron)

Enter <- c("09:12", "17:01")
Enter <- times(paste0(Enter, ":00"))

Exit <-  c("10:15", "18:11")
Exit <- times(paste0(Exit, ":00"))

Exit - Enter # durations

sum(Enter < "10:00:00") # no entering before 10am
mean(Enter < "10:00:00") # fraction entering before 10am

sum(Exit >  "17:00:00") # no exiting after 5pm
mean(Exit >  "17:00:00") # fraction exiting after 5pm

table(cut(hours(Enter), breaks = c(0, 10, 17, 24))) # Counts for indicated hours   
 ## (0,10] (10,17] (17,24] 
 ##      1       1       0 

table(hours(Enter))  # Counts of entries each hour
## 9 17 
## 1  1

stem(hours(Enter), scale = 2)
## The decimal point is at the |

##   9 | 0
##  10 | 
##  11 | 
##  12 | 
##  13 | 
##  14 | 
##  15 | 
##  16 | 
##  17 | 0

Graphics:

tab <- c(table(Enter), -table(Exit))  # Freq at each time.  Enter is pos; Exit is neg.
plot(times(names(tab)), tab, type = "h", xlab = "Time", ylab = "Freq")
abline(v = c(10, 17)/24, col = "red", lty = 2) # vertical red lines
abline(h = 0)  # X axis

screenshot

Otros consejos

Thanks for the feedback and sorry for the confusion I have edited it a bit to clarify.

New Edit:

First, chron package and strptime with fixed format both work well as demonstrated in other answers. I just want to introduce lubridate a little bit since it's easier to use, and flexible with time format.

Example data

df <- data.frame(TimeEnterChar = c(rep("07:58", 10), "08:02", "08:03", "08:05", "08:10", "09:00"),
                 TimeExitChar  = c("16:30", "16:50", "17:00", rep("17:02", 10), "17:30", "18:59"),
                 stringsAsFactors = F)

If all you want is to count how many entry time were later than 8:00, then you can compare the character directly. Below would should 5 entry time were later.

sum(df$TimeEnterChar > "08:00")

If you want more, personally, I like lubridate package when dealing with time data, especially timestamps with dates although it's not the focus of this post at all.

library(lubridate)
# Convert character to a "Period" class by lubridate, shows in form of H M S
df$TimeEnterTime <- hm(df$TimeEnterChar)
df$TimeExitTime  <- hm(df$TimeExitChar)
head(df)

sum(df$TimeEnterTime > hm("08:00"))

You can still compare the time.

A little more about using them as numeric: I assume only minute-level time is wanted. Thus, I divided number of seconds by 60 to get number of minutes.

df$DurationMinute <- as.numeric( df$TimeExitTime - df$TimeEnterTime )/60
hist(df$DurationMinute, breaks = seq(500, 600, 5))

head(df)
  TimeEnterChar TimeExitChar TimeEnterTime TimeExitTime DurationMinute
1         07:58        16:30     7H 58M 0S   16H 30M 0S            512
2         07:58        16:50     7H 58M 0S   16H 50M 0S            532
3         07:58        17:00     7H 58M 0S    17H 0M 0S            542
4         07:58        17:02     7H 58M 0S    17H 2M 0S            544
5         07:58        17:02     7H 58M 0S    17H 2M 0S            544
6         07:58        17:02     7H 58M 0S    17H 2M 0S            544

You can simply plot a histogram to see the distribution of time duration between entry and exit.

You can also look at the distribution of entry/exit time. But some effort is needed to convert the axis.

df$TimeEnterNumMin <- as.numeric(df$TimeEnterTime) / 60
df$TimeExitNumMin  <- as.numeric(df$TimeExitTime) / 60

hist(df$TimeEnterNumMin, breaks = seq(0, 1440, 60), xaxt = 'n', main = "Whole by 1hr")
axis(side = 1, at = seq(0, 1440, 60), labels = paste0(seq(0, 24, 1), ":00"))

hist(df$TimeEnterNumMin, breaks = seq(420, 600, 15), xaxt = 'n', main = "Morning by 15min")
axis(side = 1, at = seq(420, 600, 60), labels = paste0(seq(7, 10, 1), ":00"))

enter image description here

I did not polish the plot, nor make the axis flexible. Please do based on your needs. Hopefully, it helps.


Below is old useless post: (no need to read. kept so that comments don't look weird)

Came across a similar issue and was inspired by this post. @G. Grothendieck and @David Arenburg provided great answers for transforming the time.

For comparison, I feel forcing the time into numeric helps. Instead of comparing "11:22:33" with "9:00:00", comparing as.numeric(hms("11:22:33")) (which is 40953 seconds) and as.numeric(hms("9:00:00")) (32400) would be much easier.

as.numeric(hms("11:22:33")) > as.numeric(hms("9:00:00"))  &  as.numeric(hms("11:22:33")) < as.numeric(hms("17:00:00"))
[1] TRUE

The above example shows 11:22:33 is between 9AM and 5PM.

To extract just time from the date or POSIXct object, substr("2013-10-01 11:22:33 UTC", 12, 19) should work, although it looks stupid to change a time object to string/character and back to time again.

Converting the time to numeric should work for plotting as @G. Grothendieck descirbed. You can convert the numbers back to time as needed for x axis labels.

Would something like that work?

SubstracTimes <-  function(TimeEnter, TimeExit){
  (as.numeric(format(strptime(TimeExit, format ="%H:%M"), "%H")) + 
  as.numeric(format(strptime(TimeExit, format ="%H:%M"), "%M"))/60) -
  (as.numeric(format(strptime(TimeEnter, format ="%H:%M"), "%H")) + 
   as.numeric(format(strptime(TimeEnter, format ="%H:%M"), "%M"))/60)
}

Testing:

TimeEnter <- "08:02"
TimeExit <- "12:02"
SubstracTimes(TimeEnter, TimeExit)
> SubstracTimes(TimeEnter, TimeExit)
[1] 4
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top