Question

Apologies if this is better suited in CrossValidated.

I am fitting GAM models to binomial data using the mgcv package in R. One of the covariates is periodic, so I am specifying the bs = "cc" cyclic cubic spline. I am doing this in a cross validation framework, but when I go to fit my holdout data using the predict function I get the following error:

Error in pred.mat(x, object$xp, object$BD) : 
  can't predict outside range of knots with periodic smoother

Here is some code that should replicate the error:

# generate data:
x <- runif(100,min=-pi,max=pi)
linPred <- 2*cos(x) # value of the linear predictor
theta <- 1 / (1 + exp(-linPred)) # 
y <- rbinom(100,1,theta)
plot(x,theta)
df <- data.frame(x=x,y=y)

# fit gam with periodic smoother:
gamFit <- gam(y ~ s(x,bs="cc",k=5),data=df,family=binomial())
summary(gamFit)

plot(gamFit)

# predict y values for new data:
x.2 <- runif(100,min=-pi,max=pi)
df.2 <- data.frame(x=x.2)
predict(gamFit,newdata=df.2)

Any suggestions on where I'm going wrong would be greatly appreciated. Maybe manually specifying knots to fall on -pi and pi?

Was it helpful?

Solution

I did not get an error on the first run but I did replicate the error on the second try. Perhaps you need to use set.seed(123) #{no error} and set.seed(223) #{produces error}. to see if that creates partial success. I think you are just seeing some variation with a relatively small number of points in your derivation and validation datasets. 100 points for GAM fit is not particularly "generous".

Looking at the gamFit object it appears that the range of the knots is encoded in gamFit$smooth[[1]]['xp'], so this should restrict your inputs to the proper range:

 x.2 <- runif(100,min=-pi,max=pi); 
 x.2 <- x.2[findInterval(x.2, range(gamFit$smooth[[1]]['xp']) )== 1]

 # Removes the errors in all the situations I tested
 # There were three points outside the range in the set.seed(223) case

OTHER TIPS

The problem is that your test set contains values that were not in the range of your training set. Since you used a spline, knots were created at the minimum and maximum value of x, and your fitted function is not defined outside of that range. So, when you test the model, you should exclude those points that are outside the range. Here is how you would exclude the points in the test set:

set.seed(2)
... <Your code>
predict(gamFit,newdata=df.2[df.2$x>=min(df$x) & df.2$x<=max(df$x),,drop=F])

Or, you could specify the "outer" knot points in the model to the min and max of your whole data. I don't know how to do that offhand.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top