Why does predict, on a subset of the original data, fail using gls when lm does the trick?

StackOverflow https://stackoverflow.com/questions/22229788

  •  06-06-2023
  •  | 
  •  

The problem is illustrated using the code below. If you run it you will see that lm handles the predict gracefully while gls fails to do so. This is most likely a problem in predict.gls but I don't understand why. This is only an issue when using the factor call. Without it everything works out fine. I am fairly confident that predict.gls fails because all levels are not present in the new dataset. However, lm works it out. To me it feels like a bug but I'm not proficient enough with the gls code to determine it.

library(nlme)

# lm example
myfit<-lm(mpg~factor(cyl):disp+hp, data=mtcars)
mypred<-predict(myfit, mtcars[1:3, 1:7])

# gls example
myfit2<-gls(mpg~factor(cyl):disp+hp, data=mtcars)
mypred2<-predict(myfit2, mtcars[1:3, 1:7])

It fails with the error:

# Error in X[, names(cf), drop = FALSE] : subscript out of bounds

Any ideas?

My R.version output:

platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 0.2
year 2013
month 09
day 25
svn rev 63987
language R
version.string R version 3.0.2 (2013-09-25) nickname Frisbee Sailing

nlme package version: "package ‘nlme’ version 3.1-113"

有帮助吗?

解决方案

Since I'm not the author of predict.gls I can't answer precisely why it was written this way, but I'm hesitant to suggest that this is a bug in a function/package that have been around this long.

Anyway...the reason it works with lm is that when predict.lm calls model.frame:

m <- model.frame(Terms, newdata, na.action = na.action, 
            xlev = object$xlevels)

it is using the xlev argument and the information on the levels from the fitted model object itself. I don't believe that gls object stores this information at all.

In predict.gls there are two things going on. First, the initial call to model.frame:

mfArgs <- list(formula = form, data = newdata, na.action = na.action)
mfArgs$drop.unused.levels <- TRUE
dataMod <- do.call("model.frame", mfArgs)

Note that here we're not using the xlev argument, and in fact we're explicitly saying to drop unused levels. If you create your own version of predict.gls and comment our the drop.unused.levels line, it should work, as long as you don't call factor in your formula!

Why?

Because later on we see this:

X <- model.matrix(form, dataMod)

where your formula is being re-applied. Which means that factor is being called on columns with levels that don't exist, forcing them to be dropped.

So when I use a version of predict.gls that comments out the drop.unused.levels line, and I set cyl to be a factor up front in the data frame, it generates predictions for me just fine.

But I would suggest asking the package authors as to whether this is intended behavior or not. It seems odd to me, but like I said, for a package this old and well used it seems strange that something like this would be un-intended.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top