Pergunta

I know that feature scaling is an important pre-processing step for creating artificial neural network models.

But what about Gradient Boosting Machines, such as LightGBM, XGBoost or CatBoost? Does their performance profit from feature scaling? If so, why and how?

Foi útil?

Solução

Scaling doesn't affect the performance of any tree-based method, not for lightgbm, xgboost, catboost or even a decision tree.

This post that elaborates on the topic, but mainly the issue is that decision trees split the feature space based on binary decisions like "is this feature bigger than this value?", and if you scale your data, the decisions might look different, as they are done on the scaled space, but the results should be the same.

As an example, a decision tree should split in the same way the data if you change units (a particular case of scaling). Let's say you want to use weight of a person to predict if someone is under 18 (as a binary classifier). If given the weight in grams, the decision tree might do something like: if weight < 5000 grams, then the person is under 18. If you change the units to kilograms, the decision tree will do: if weight < 5 kg, then the person is under 18.

To sum up, the splits will be equivalent under any scaling or, in general, any increasing linear transformation.

Licenciado em: CC-BY-SA com atribuição
scroll top