Question

i am running the gbm() function for multiple additive multinomial models with 6 response categories each on a large dataset (~ 0.5-1 mio. lines per model). The model is like this (pretty much the defaults).

gbm <-
gbm(Y ~ A + B + C + D + E + F, 
  data=data,                   
  var.monotone=c(0,0,0,0,0,0), 
  distribution="multinomial", 
  n.trees=500,                
  shrinkage=0.1,               
  interaction.depth=1,        
  bag.fraction = 0.5,          
  train.fraction = 0.5,        
  n.minobsinnode = 5,         
  cv.folds = 0,               
  keep.data=TRUE,              
  verbose=FALSE,                
  weights=sampleWeight)     

Y is a factor with 6 categories, the explaining variables are metric and factors. data is a data.table. This code runs fine. The prediction is good. When this is done i save the predictions and clean the workspace with: rm(list=ls(all=TRUE)) and additionally run gc() but it will not release the memory. I expect that when cleaning all the workspace i should have about the same memory usage as at the start of the R session.

In my specific case the RAM usage is about 1.5GB after loading the data. After fitting the model its at the limit of my pc at about 14GB. After cleaning the Workspace its at about 12GB. The only solution at the moment for me is to restart the whole R session, reload the data and run the next model.

Is there a solution to this, so that i dont have to restart the session all the time?

Thanks a lot!

Was it helpful?

Solution

Yes, there is a memory leak with gbm. Ironically the fix is on the gbm website, but the maintainers have failed to incorporate it into the CRAN release.

http://r-forge.r-project.org/tracker/?atid=1813&group_id=443&func=browse

OTHER TIPS

The maintainers of gbm() have fixed the memory leak problem for Laplace and multinomial distributions, as well as other bugs. In addition, they have added cox regression and other features. As of July 2016, these fixes have NOT yet been incorporated into new releases of the gbm package that can be found on CRAN mirrors or installed using the install.packages("gbm") statement. However, this is in the works and should appear in a future version gbm-2.1.2 or gbm-3.0.0 on the CRAN mirrors.

Fortunately, you can get the latest working version with the bug fixes now. You get it from: https://github.com/gbm-developers/gbm . See also https://github.com/gbm-developers/gbm/issues/16#issuecomment-234054158 .

The statements I used to install the latest working versions were:

In Linux console:

sudo apt-get -y build-dep libcurl4-gnutls-dev sudo apt-get -y install libcurl4-gnutls-dev sudo apt-get -y build-dep libxml2-dev sudo apt-get -y install libxml2-dev

In R:

remove.packages("gbm") install.packages("devtools", dependencies=TRUE) library(devtools) install_github("gbm-developers/gbm")

I just tested the above development version on Windows, Linux Debian (Gnome) and Linux Ubuntu (Mint) by running gbm fits with options distribution="laplace" and distribution="multinomial." There were no memory leaks like those that plagued gbm versions 2.1.1 and earlier.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top