How is the “base value” of SHAP values calculated?

https://datascience.stackexchange.com/questions/73553

11-12-2020
|

Question

I'm trying to understand how the base value is calculated. So I used an example from SHAP's github notebook, Census income classification with LightGBM.

Right after I trained the lightgbm model, I applied explainer.shap_values() on each row of the test set individually. By using force_plot(), it yields the base value, model output value, and the contributions of features, as shown below:

My understanding is that the base value is derived when the model has no features. But how is it actually calculated in SHAP?

La solution

As you say, it's the value of a feature-less model, which generally is the average of the outcome variable in the training set (often in log-odds, if classification). With force_plot, you actually pass your desired base value as the first parameter; in that notebook's case it is explainer.expected_value[1], the average of the second class.

https://github.com/slundberg/shap/blob/06c9d18f3dd014e9ed037a084f48bfaf1bc8f75a/shap/plots/force.py#L31

https://github.com/slundberg/shap/issues/352#issuecomment-447485624

Licencié sous: CC-BY-SA avec attribution

Non affilié à datascience.stackexchange