Question

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes XGBOOST so fast?

Was it helpful?

Solution

Quote from the author of xgboost:

Both xgboost and gbm follows the principle of gradient boosting. There are however, the difference in modeling details. Specifically, xgboost used a more regularized model formalization to control over-fitting, which gives it better performance.

We have updated a comprehensive tutorial on introduction to the model, which you might want to take a look at. Introduction to Boosted Trees

The name xgboost, though, actually refers to the engineering goal to push the limit of computations resources for boosted tree algorithms. Which is the reason why many people use xgboost. For model, it might be more suitable to be called as regularized gradient boosting.

Edit: There's a detailed guide of xgboost which shows more differences.

References

https://www.quora.com/What-is-the-difference-between-the-R-gbm-gradient-boosting-machine-and-xgboost-extreme-gradient-boosting

https://xgboost.readthedocs.io/en/latest/tutorials/model.html

OTHER TIPS

In addition to the answer given by Icyblade, the developers of xgboost have made a number of important performance enhancements to different parts of the implementation which make a big difference in speed and memory utilization:

  1. Use of sparse matrices with sparsity aware algorithms
  2. Improved data structures for better processor cache utilization which makes it faster.
  3. Better support for multicore processing which reduces overall training time.

In my experience when using GBM and xgboost while training large datasets (5 million+ records), I've experienced significantly reduced memory utilization (in R) for the same dataset and found it easier to use multiple cores to reduce training time.

One very important difference is xgboost has implemented DART, the dropout regularization for regression trees.

References

Rashmi, K. V., & Gilad-Bachrach, R. (2015). Dart: Dropouts meet multiple additive regression trees. arXiv preprint arXiv:1505.01866.

I think the difference between the gradient boosting and the Xgboost is in xgboost the algorithm focuses on the computational power, by parallelizing the tree formation which one can see in this blog.

Gradient boosting only focuses on the variance but not the trade off between bias where as the xg boost can also focus on the regularization factor.

XGBoost implementation is buggy. Crashed silently when training on GPU on v 082 . It happened to me as well on v 0.90, so the issue has not been addressed so far, and the "fix" provided in GitHub didn't work for me.

LGBM 2.3.1 works like a charm out of the box, though installing it requires a little more effort. So far no issues training on GPU.

About XGBoost being "so fast", you should take a look at these benchmarks.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top