Question

I first fitted a Poisson glm in R as follows:

> Y<-c(13,21,12,11,16,9,7,5,8,8)
> X<-c(74,81,80,79,89,96,69,88,53,72)
> age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870,
+ 51.44643,38.20279,59.76469)
> dat=data.frame(Y=Y,off.set.term=log(X),age=age)
> fit.1=glm(Y~age+offset(off.set.term),data=dat,family=poisson)

Next I tried to get predictions of the response (on the log scale) for a new dataset using predict function. Note that I set the offset term as zero.

> newdat=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453),off.set.term=rep(0,5))
> predict(fit.1,newdata =newdat,type="link")
        1         2         3         4         5 
-1.964381 -1.956234 -1.954343 -1.876416 -1.914839 

Next I tried package segmented (version 0.3-0.0) in R and fitted a segmented glm as follows. (The latest version of segmented package (i.e. 0.3-1.0) does not seem to support the offset term when using the predict function.)

> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),
+ offs=off.set.term,data=newdat)

I then used the predict function with fit.2 to get the predicted values:

> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),offs=off.set.term,data=newdat)
> 
> predict(fit.2,newdata =newdat,type="link")
        1         2         3         4         5 
-26.62968 -26.08611 -25.95997 -20.76125 -23.32456 

These predicted values are significantly different from the one I obtained using fit.1.

The problem seems to be in the offset term because when we fit the models without an offset term, then the results are reasonable and close to each other as follow:

> fit.3=glm(Y~age,data=dat,family=poisson)
> newdat.2=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453))
> predict(fit.3,newdata =newdat.2,type="link")
       1        2        3        4        5 
2.406016 2.395531 2.393098 2.292816 2.342261 
> fit.4=segmented(fit.3,seg.Z=~age,psi=list(age=mean(age)),data=newdat.2)
> predict(fit.4,newdata =newdat.2,type="link")
       1        2        3        4        5 
2.577669 2.524503 2.512165 2.003679 2.254396 
Was it helpful?

Solution

Since I got the answer from the segmented package maintainer, I decided to share it here. First, up-date the package to version 0.3-1.0 by

install.packages("segmented",type="source")

After updating, running the same commands leads to:

> Y<-c(13,21,12,11,16,9,7,5,8,8)
> X<-c(74,81,80,79,89,96,69,88,53,72)
> age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870,
+ 51.44643,38.20279,59.76469)
> dat=data.frame(Y=Y,off.set.term=log(X),age=age)
> fit.1=glm(Y~age+offset(off.set.term),data=dat,family=poisson)
> 
> newdat=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453),off.set.term=rep(0,5))
> predict(fit.1,newdata =newdat,type="link")
        1         2         3         4         5 
-1.964381 -1.956234 -1.954343 -1.876416 -1.914839 
> 
> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),offs=off.set.term,data=newdat)
> predict(fit.2,newdata =newdat,type="link")
Error in offset(off.set.term) : object 'off.set.term' not found

So the offset term cannot be found. Now the trick (for now) is to first attach newdat and then predict as follows:

> attach(newdat)
The following object is masked _by_ .GlobalEnv:

    age
> predict(fit.2,newdata =newdat,type="link")
        1         2         3         4         5 
-1.825831 -1.853842 -1.860342 -2.128237 -1.996147 

The results do make sense now. Cheers!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top