Question

I'm trying to fit a regression with time of day as a continuous predictor, and a binary TRUE/FALSE outcome.

My time of day variable looks like this:

> class(sched_SMS_time)
[1] "POSIXct" "POSIXt" 
> head(sched_SMS_time)
[1] NA    "2014-01-01 11:15:00 EST" "2014-01-01 11:30:00 EST" 

My issue is that R keeps treating it in a categorical sense (i.e. as a factor), and throwing my regression models waaaaay out.

The only approach I can think of (and have found elsewhere on the stack exchange site) appears to be converting the POSIXct object to a decimal numeric counterpart, i.e.

as.numeric(str_sub(gsub(":", ".", bob_os_ten$sched_SMS),1,-4))
head(sched_SMS_time_conv)
[1]    NA 11.15 11.30 11.45 12.15 13.00

Plugging this back into the models I hope to run, this seems to give sensible results...

However, I realise this looses finer grained information. (i.e., there's no way to distinguish between 9.00 on a Monday, and 9.00 on a Tuesday).

My questions are therefore:

1) Is there an approach that allows POSIXct objects to be used directly in regressions in a continuous sense (both basic stuff, and in lme4 for multilevel data)

2) If the answer is "no", is the workaround described above the best alternative?

Was it helpful?

Solution 2

It probably makes sense to convert your time to a continuous variable of time since some particular base time (e.g. seconds since the beginning of January 1, 1970, also known as seconds since epoch).

This is very easy to do with POSIXct via the unclass function:

str(Sys.time())
#   POSIXct[1:1], format: "2013-12-31 22:59:18"

unclass(Sys.time())
# [1] 1388548783

So in your example, you would replace sched_SMS_time with unclass(sched_SMS_time) in the regression model.

OTHER TIPS

Actually, a vector of POSIXct times (suppose its called tt) can be used directly and it will be handled as if it were as.numeric(tt), i.e. as if it were the number of seconds since 1970-01-01 00:00:00 GMT. Here is an example:

# set up inputs
set.seed(123)
n <- 100 # must be even as n/2 is used below
y <- rbinom(n, 1, .5) == 1
tt <- seq(as.POSIXct("2004-01-01"), length = n, by = "day")

# run a glm regression
glm(y ~ tt, family = binomial)

# and an lme4 example
library(lme4)
g <- gl(2, n/2)
glmer(y ~ tt + (1 | g), family = binomial)

ADDED: New answer. Changed the linear regression to logistic regression as pointed out by @jlhoward. Added lme4 example.

This just builds on @G.Grothendieck's response, noting that your response variable is binary (T/F).

If your response is y (vector of T/F), and your predictor, sched_SMS_time is POSIXct, create a dataframe df as:

# not tested...
df <- data.frame(y=y, time=sched_SMS_time, t=sched_SMS_time-sched_SMS_time[1])
fit <- glm(y~t, data=df,family=binomial())
df$pred <- predict(fit,type="response")

library(ggplot2)
ggplot(df, aes(x=time)) + geom_point(aes(y=y)) + geom_line(aes(y=pred))

Note that this fits using t, but plots using time .

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top