How is the “base value” of SHAP values calculated?
-
11-12-2020 - |
Question
I'm trying to understand how the base value is calculated. So I used an example from SHAP's github notebook, Census income classification with LightGBM.
Right after I trained the lightgbm model, I applied explainer.shap_values()
on each row of the test set individually. By using force_plot()
, it yields the base value, model output value, and the contributions of features, as shown below:
My understanding is that the base value is derived when the model has no features. But how is it actually calculated in SHAP?
Solution
As you say, it's the value of a feature-less model, which generally is the average of the outcome variable in the training set (often in log-odds, if classification). With force_plot
, you actually pass your desired base value as the first parameter; in that notebook's case it is explainer.expected_value[1]
, the average of the second class.
https://github.com/slundberg/shap/issues/352#issuecomment-447485624