Question

I am using multiple regression models to model a problem e.g. one for each day of the week. I want to store these in a data structure in R so I can call them when needed. As a minimal example of the problem, say I want to construct one for each value of z:

df = data.frame(z = rep(c('a', 'b'), each = 10),
            x = rep.int(1:10, times = 2), 
            y = rep.int(10:1, times = 2))

Naively, I tried the following attempt:

dt = data.table(df)

v.type = unique(dt$z)
v.loess = numeric(0)

for(i in v.type)
  v.loess = c(v.loess, loess(y ~ x, data = dt[z==i]))

dt.loess = data.table(type = v.type, model = v.loess)

Given e.g. type = a and x = 2.5, the idea was then to run something like this:

predict(dt.loess[z == 'a', 'model', with = FALSE], newdata = 2.5)

This method fails utterly. Any help appreciated.

Was it helpful?

Solution

loess() returns an S3 object of class loess. Internally these objects are represented as lists, what you can see if you call str() on them.

Now there are mainly two reasons why your solutions fails:

  1. In the for loop, you concatenate these objects (read here lists) with c() which simply concatenates the list elements (or object fields) to one larger list, resulting in an invalid object of class loess. Then you try to save this list as column in your data.table.

  2. I don't see why you would want to store the models in the end in a data.table, but I am almost sure, that this can't be done, since the column elements must be (as far as I remember) atomic vectors or lists. I'd rather store the models in a named list and leave them there.


Solution using base R

model <- list()
for( i in unique(dt$z) ){
  model[[i]] <- loess(y ~ x, data = dt[ z==i ] )
}

Solution using plyr

library("plyr")
model <- dlply( df, "z", function(data){
  loess( y ~ x, data = data )
} )

Or even shorter:

library("plyr")
model <- dlply( df, "z", loess, formula = quote(y ~ x) )

The latter two solutions can be applied on both: data.table and data.frame, but you don't take any advantages of the data.table in this case. So you have to decide what is faster and less error-prone in the end.


In all three cases you can call

predict( model[["a"]], newdata = 2.5)
[1] 8.5

OTHER TIPS

For one, using c to combine the results is giving funny results; better to fill up a list directly. For another, using data.table to get the results out is tricky; I'm not a data.table expert, so I would just use the results directly from the list.

v.type = unique(df$z)
v.loess = list()
for(i in v.type) {
  v.loess[[i]] <- loess(y ~ x, data = df[df$z==i,])
}
predict(v.loess$a, newdata=2.5)
predict(v.loess$b, newdata=2.5)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top