Question

When trying to interpret the results of a gradient boosting (or any decision tree) one can plot the feature importance.

There are same parameters in the xgb api such as: weight, gain, cover, total_gain and total_cover. I am not quite getting cover.

”cover” is the average coverage of splits which use the feature where coverage is defined as the number of samples affected by the split

I am looking for a better definition of cover and perhaps some pseudocode to understand it better.

Was it helpful?

Solution

A more detailed explanation of cover can be found in the code

cover: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be

You can find this here: cover definition in the code

This basically mean that for each split the second order gradient of the specified loss is computed (per sample). Then this number is scaled by the number of samples.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top