What is the selection criteria to choose between XGBoost and Random Forest

https://datascience.stackexchange.com/questions/69847

09-12-2020
|

Pregunta

I am trying to understand - when would someone choose Random Forest over XGBoost and vice versa. All the articles out there highlights on the differences between both. I understand them. But when actually given a real world data set, how should we approach the problem to choose between these?

For eg: Is there a set of statistical tests for variance check, and then choose? Or is it simply like you have a number of features, and cannot really choose to do parameter tuning, so you apply Random Forest to get results?

Solución

Let´s say that the best way to choose is empirical. You run both algorithms in the dataset and check which one has better performance.

It's true that you can do a lot of theoretical analysis but at the end you have to try no matter what. They both use decision trees ensemble so the results should not be too different. By experience gradient boosting tends to achieve better results. Also, it is more mathematically complicated to understand.

Normally decision trees don´t require too much parameter turning, or at least less than other models.

There is no classical statistic test that will tell you will perform better. There are some heuristics but I find them overly complicated.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange