Keeping the tsp attributes in response variable when using multivariate time series as data in lm

https://stackoverflow.com/questions/18512684

26-06-2022
|

Question

I'm wondering how to capture tsp attributes of a response variable used in formula for example in lm.

In the help of model.frame it says that option na.action=NULL should keep the tsp attributes:

Unless na.action = NULL, time-series attributes will be removed from the variables found

But this seems to not be case if using multivariate time series object as data. Here's an example using lm:

    Seatbelts[,"drivers"]
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1969 1687 1508 1507 1385 1632 1511 1559 1630 1579 1653 2152 2148
1970 1752 1765 1717 1558 1575 1520 1805 1800 1719 2008 2242 2478
1971 2030 1655 1693 1623 1805 1746 1795 1926 1619 1992 2233 2192
...

out<-lm(drivers~log(PetrolPrice)+law,data=Seatbelts,na.action=NULL,y=TRUE)
out$y
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16  
1687 1508 1507 1385 1632 1511 1559 1630 1579 1653 2152 2148 1752 1765 1717 1558
...

But this works:

out<-lm(Seatbelts[,"drivers"]~log(PetrolPrice)+law,data=Seatbelts,na.action=NULL,y=TRUE)
out$y
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1969 1687 1508 1507 1385 1632 1511 1559 1630 1579 1653 2152 2148
1970 1752 1765 1717 1558 1575 1520 1805 1800 1719 2008 2242 2478
1971 2030 1655 1693 1623 1805 1746 1795 1926 1619 1992 2233 2192
...

So, any ideas how get the first case working? In real case I have my own function which uses formula and extracts the predictors and response variable almost identical as in case of lm function, using model.frame and model.response functions.

After looking the issue bit more, it seems that the latter case works because in model.frame.default variables are found using command eval(vars, data, env), and as Seatbelts[,"drivers"] is not part of the data, it is evaluated from global environment, keeping it tsp attributes. In the former case drivers is part of the data, which is transformed to data.frame earlier in the function, which removes tsp attributes.

Here's a simple example of the problem, this is basically what happens in model.frame:

data <- as.data.frame(Seatbelts) #this strips tsp attribute from the data
formula <- terms(as.formula(drivers~1), data = data)
env <- environment(formula)
vars <- attr(formula, "variables")
variables <- eval(vars, data, env)
variables[[1]] #no tsp attributes as variable drivers is taken from data

data <- as.data.frame(Seatbelts)
formula <- terms(as.formula(Seatbelts[,"drivers"]~1), data = data)
env <- environment(formula)
vars <- attr(formula, "variables")
variables <- eval(vars, data, env)
variables[[1]] 
# tsp attributes still here, as variable Seatbelts[,"drivers"]
# is not in data, it is taken from global environment

Solution 2

What I ended up doing was that if the call contains data argument, I store the possible tsp attribute before processing the data any further:

if (missing(data)) {
  data <- environment(formula)
  tsp_data <- NULL
} else {
  tsp_data <- tsp(data)
  data <- as.data.frame(data)
}

Then later I re-transform my response variable to ts object:

class(y) <- if (p > 1) {
  c("mts", "ts", "matrix")
} else "ts"
if (is.null(tsp(y))) {
  if (!is.null(tsp_data)) {
    tsp(y) <- tsp_data
  } else tsp(y) <- c(1, n, 1)
}

OTHER TIPS

Try this:

library(dyn)
out <- dyn$lm(drivers ~ log(PetrolPrice) + law, data = Seatbelts, na.action = NULL)
y <- fitted(out) + resid(out)
y

or alternately just fix it up manually:

 out <- lm(drivers ~ log(PetrolPrice) + law, Seatbelts, y = TRUE)
 out$y <- ts(out$y)
 tsp(out$y) <- tsp(Seatbelts)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow