R VGAM package: fit is poorer after adding the second explanatory vairable

https://stackoverflow.com/questions/21566731

07-10-2022
|

Question

Why does the fit get worse after adding the second explanatory variable?

require("VGAM")
df = data.frame(x = c(1,2,3,4,5,6,7,8,9,10), y = c(1,4,8,15,25,36,48,65,80,105), z =        c(0,0,0,1,100,400,900,1600,1800,200)  )
vgt1 = vgam(y~s(x, df=2), data=df,family=gaussianff, trace=TRUE)
vgt2 = vgam(y~cbind(s(x, df=2),s(z, df=2)), data=df,family=gaussianff, trace=TRUE)

plot(df$x, df$y, col="black")
lines(df$x, vgt1@predictors, col="red")
lines(df$x, vgt2@predictors, col="blue")

Solution

When you add a variable you use + not cbind.

vgam parses the formula using terms.formula to look for specials = 's', i.e. terms that are wrapped in s signifying a spline.

Therefore

vgt2 = vgam(y~s(x, df=2)+s(z, df=2), data=df,family=gaussianff, trace=TRUE)

will give you what you want (and this has a lower deviance than vgt1).

When you fit

vgt2 = vgam(y~cbind(s(x, df=2),s(z, df=2)), data=df,family=gaussianff, trace=TRUE)

terms.formula doesn't find any specials that start with s, as cbind is the function that identifies the term in the formula. Therefore

gam(y~cbind(s(x, df=2),s(z, df=2)), data=df,family=gaussianff, trace=TRUE)

is the equivalent of

gam(y~cbind(x,y), data=df,family=gaussianff, trace=TRUE)

which in term is the equivalent of

vgam(y~x+z, data=df,family=gaussianff, trace=TRUE)

i.e. no spline terms are fitted.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow