Question

For some reason, the train function in caret package changes the name of the response variable. Here is a toy example:

library(caret)
library(data.table)
DT <- data.table(x = rnorm(10), y = rnorm(10))
> DT
 #            x          y
 #1: -1.7844589  0.4834738
 #2: -0.3519577 -0.4644998
 #3:  1.0697762 -0.9183105
 #4: -0.2624022 -1.0952624
 #5: -1.0875959 -1.0267012
 #6:  0.1442927 -0.8669099
 #7:  0.3886957  0.2272433
 #8: -0.1625200  0.8286582
 #9: -0.5419324 -0.0526076
 #10:  0.4669790  0.2916581
cv.ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 1)
fit <- train(y ~ x, data = DT, 'lm', trControl = cv.ctrl)
> DT
 #            x   .outcome
 #1: -1.7844589  0.4834738
 #2: -0.3519577 -0.4644998
 #3:  1.0697762 -0.9183105
 #4: -0.2624022 -1.0952624
 #5: -1.0875959 -1.0267012
 #6:  0.1442927 -0.8669099
 #7:  0.3886957  0.2272433
 #8: -0.1625200  0.8286582
 #9: -0.5419324 -0.0526076
 #10:  0.4669790  0.2916581

I know I can rename it after the training, but it gets repetitive if I have many models to train. Is this the correct behavior?

EDIT: add sessionInfo

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-24     ggplot2_0.9.3.1  lattice_0.20-29  data.table_1.9.2

loaded via a namespace (and not attached):
 [1] car_2.0-19       codetools_0.2-8  colorspace_1.2-4 digest_0.6.4     foreach_1.4.2    grid_3.1.0       gtable_0.1.2    
 [8] iterators_1.0.7  MASS_7.3-31      munsell_0.4.2    nnet_7.3-8       plyr_1.8.1       proto_0.3-10     Rcpp_0.11.1     
[15] reshape2_1.2.2   scales_0.2.4     stringr_0.6.2    tools_3.1.0     
Was it helpful?

Solution

Update: This is now fixed in the current development version 1.9.5. From NEWS:

names<-.data.table works as intended on data.table unaware packages with Rv3.1.0+. Closes #476 and #825. Thanks to ezbentley for reporting here on SO and to @narrenfrei.


Similar to @hrbrmstr suggestion, you can do

library(caret)
library(data.table)
DT <- data.table(x = rnorm(10), y = rnorm(10))
cv.ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 1)
fit <- train(y ~ x, data = as.data.frame(DT), 'lm', trControl = cv.ctrl)
DT
#              x           y
# 1: -0.06027817  1.32641243
# 2:  0.28842856  0.60240700
# 3:  1.14196056  0.97159637
# 4: -0.82907332  0.82955574
# 5:  0.73742357 -0.63901239
# 6:  0.12551649  1.33047527
# 7: -1.12110293 -0.03315772
# 8:  0.29933697 -1.52464998
# 9:  1.66046182  0.21068356
# 10: -0.09126467  2.02206078

This way you won't lose the data.table class

OTHER TIPS

Hrm… when using a data.frame it does not reproduce the results:

dat <- data.frame(x = rnorm(10), y = rnorm(10))
cv.ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 1)
fit.1 <- train(y ~ x, data = dat, 'lm', trControl = cv.ctrl)
dat
##             x           y
## 1  -0.3458644 -0.96606867
## 2   1.2248085  0.02072409
## 3  -1.7541273 -0.26734265
## 4   0.7887834 -0.51012773
## 5  -2.2282504  0.91898424
## 6  -1.4195865  1.24238977
## 7  -1.3931804 -0.15301954
## 8  -0.6822076  0.32615825
## 9  -1.2455969  1.00837799
## 10 -0.4506061 -0.09332418
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top