Question

I'm trying to understand how the base value is calculated. So I used an example from SHAP's github notebook, Census income classification with LightGBM.

Right after I trained the lightgbm model, I applied explainer.shap_values() on each row of the test set individually. By using force_plot(), it yields the base value, model output value, and the contributions of features, as shown below: enter image description here

My understanding is that the base value is derived when the model has no features. But how is it actually calculated in SHAP?

Was it helpful?

Solution

As you say, it's the value of a feature-less model, which generally is the average of the outcome variable in the training set (often in log-odds, if classification). With force_plot, you actually pass your desired base value as the first parameter; in that notebook's case it is explainer.expected_value[1], the average of the second class.

https://github.com/slundberg/shap/blob/06c9d18f3dd014e9ed037a084f48bfaf1bc8f75a/shap/plots/force.py#L31

https://github.com/slundberg/shap/issues/352#issuecomment-447485624

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top