Question

Consider the short R script below. It seems that boost.hitters$train.error does not match up with either the raw residuals or the squared errors of the training set.

I could not find documentation on train.error at all, so I am wondering if anyone knows what train.error really represents here and how it is computed?

library(ISLR)
library(gbm)

set.seed(1)

Hitters=na.omit(Hitters)
Hitters$Salary = log(Hitters$Salary)

boost.hitters=gbm(Salary~.,data=Hitters, n.trees=1000,interaction.depth=4, shrinkage= 0.01)
yhat.boost=predict(boost.hitters,newdata=Hitters,n.trees=1000)

mean(boost.hitters$train.error^2)
mean(boost.hitters$train.error)

mean((yhat.boost-Hitters$Salary)^2)

Output:

[1] 0.03704581
[1] 0.1519719
[1] 0.07148612
Was it helpful?

Solution

I asked a professor at my University.

Apparently train.error represents the training error (that is, the MSE) after each tree is added. Thus the error I computed is equal to the training error of the last tree, so in my example:

mean((yhat.boost-Hitters$Salary)^2) == boost.hitters$train.error[1000] 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top