Question

On running a gam model using the mgcv package, I encountered a strange error message which I am unable to understand:

“Error in model.frame.default(formula = death ~ pm10 + Lag(resid1, 1) + : variable lengths differ (found for 'Lag(resid1, 1)')”.

The number of observations used in model1 is exactly the same as the length of the deviance residual, thus I think this error is not related to difference in data size or length.

I found a fairly related error message on the web here, but that post did not receive an adequate answer, so it is not helpful to my problem.

Reproducible example and data follows:

library(quantmod)
library(mgcv) 
require(dlnm)

df <- chicagoNMMAPS
df1 <- df[,c("date","dow","death","temp","pm10")] 
df1$trend<-seq(dim(df1)[1]) ### Create a time trend

Run the model

model1<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5),
data=df1, na.action=na.omit, family=poisson)

Obtain deviance residuals

resid1 <- residuals(model1,type="deviance")

Add a one day lagged deviance to model 1

model1_1 <- update(model1,.~.+ Lag(resid1,1),  na.action=na.omit)

model1_2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5) + Lag(resid1,1), data=df1, 
na.action=na.omit, family=poisson)

Both of these models produced the same error message.

Was it helpful?

Solution

Joran suggested to first remove the NAs before running the model. Thus, I removed the NAs, run the model and obtained the residuals. When I updated model2 by inclusion of the lagged residuals, the error message did not appear again.

Remove NAs

df2<-df1[complete.cases(df1),]

Run the main model

model2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5), data=df2, family=poisson)

Obtain residuals

resid2 <- residuals(model2,type="deviance")

Update model2 by including the lag 1 residuals

model2_1 <- update(model2,.~.+ Lag(resid2,1),  na.action=na.omit)

OTHER TIPS

Another thing that can cause this error is creating a model with the centering/scaling standardize function from the arm package -- m <- standardize(lm(y ~ x, data = train))

If you then try predict(m), you get the same error as in this question.

Its simple, just make sure the data type in your columns are the same. For e.g. I faced the same error, that and an another error:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

So, I went back to my excel file or csv file, set a filter on the variable throwing me an error and checked if the distinct datatypes are the same. And... Oh! it had numbers and strings, so I converted numbers to string and it worked just fine for me.

Another reason could be a variable with the same name as the column. The formula won't know which one (variable or column) to take. Check the list of variables via ls() (or in RStudio) and use remove(<varname>) to remove if such a conflicting variable exists.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top