Why isn't dimension sampling used with gradient boosting machines (GBM)?

https://datascience.stackexchange.com/questions/2537

16-10-2019
|

문제

GBMs, like random forests, build each tree on a different sample of the dataset and hence, going by the spirit of ensemble models, produce higher accuracies. However, I have not seen GBM being used with dimension sampling at every split of the tree like is common practice with random forests.

Are there some tests that show that dimensional sampling with GBM would decrease its accuracy because of which this is avoided, either in literature form or in practical experience?

해결책

sklearn's GradientBoostingClassifier / GradientBoostingRegressor have a max_features parameter and XGBoost has colsample_bylevel and colsample_bytree parameters that control how many features are sampled for each tree / split.

다른 팁

I have never read (or tried) subsampling techniques on each node of the tree. I do not say that they might not exist, but it looks odd the statement "like is a common practice with random forests".

Other than that, subsampling or bootstrapping the sample for each tree from random forests gives stochastic gradient boosting Friedman, 1999. They give better results with subsampling than without and more details are given also in Elements of Statistical Learning, page 358, 10.12.2 Subsampling.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange