Why does predict, on a subset of the original data, fail using gls when lm does the trick?

Question

Since I'm not the author of predict.gls I can't answer precisely why it was written this way, but I'm hesitant to suggest that this is a bug in a function/package that have been around this long.

Anyway...the reason it works with lm is that when predict.lm calls model.frame:

m <- model.frame(Terms, newdata, na.action = na.action, 
            xlev = object$xlevels)

it is using the xlev argument and the information on the levels from the fitted model object itself. I don't believe that gls object stores this information at all.

In predict.gls there are two things going on. First, the initial call to model.frame:

mfArgs <- list(formula = form, data = newdata, na.action = na.action)
mfArgs$drop.unused.levels <- TRUE
dataMod <- do.call("model.frame", mfArgs)

Note that here we're not using the xlev argument, and in fact we're explicitly saying to drop unused levels. If you create your own version of predict.gls and comment our the drop.unused.levels line, it should work, as long as you don't call factor in your formula!

Why?

Because later on we see this:

X <- model.matrix(form, dataMod)

where your formula is being re-applied. Which means that factor is being called on columns with levels that don't exist, forcing them to be dropped.

So when I use a version of predict.gls that comments out the drop.unused.levels line, and I set cyl to be a factor up front in the data frame, it generates predictions for me just fine.

But I would suggest asking the package authors as to whether this is intended behavior or not. It seems odd to me, but like I said, for a package this old and well used it seems strange that something like this would be un-intended.