The issue is in the control parameter. You are using method = "cv"
and number = 10
but you are also specifying the exact resamples that will be used to fit the model (via the index
argument). I assume that this is the grant data from the book. In chapter 12 we describe the data splitting scheme where the pre2008
vector indicates that 6,633 of the 8,190 samples will be used for training. That leaves 1,557 left out during model tuning:
> dim(training)
[1] 8190 1785
> length(pre2008)
[1] 6633
> 8190-6633
[1] 1557
The predictions on the non-pre2008
samples are what you are seeing in the table. If you are trying to reproduce what we have, page 312 has the correct syntax:
ctrl <- trainControl(method = "LGOCV",
summaryFunction = twoClassSummary,
classProbs = TRUE,
index = list(TrainSet = pre2008))
If you just want to do 10-fold CV, get rid of the index
argument.
tl;dr The control function says 10-fold CV but the index
argument says one hold-out of 1,557 samples should be used.
Max