Getting the positive impacting features using SHAP

https://datascience.stackexchange.com/questions/73211

10-12-2020
|

Question

I'm attempting to use SHAP to automatically extract feature names that have a positive impact on my regression models. On inspection of the code I see that the bar plot, for example, determines these by taking the mean absolute SHAP values for a feature. Being an absolute value, it obviously takes the absolute impact but I want to only consider positive impacting values.

Is my intuition that I can just take the mean instead of the mean of the absolute values correct? (highly) Negative SHAP values should give a negative mean value.

Is this a good approach or am I missing some better way to do this?

EDIT: I am specifically interested in features that raise the predicted value. ie. if feature_1 lifts the predicted value by 100 and feature_2 by 1000, I want this information to be extracted as feature_2 has and higher impact on the output value.

Solution

Depending on your model there may be some better model-specific approaches than SHAP. It is also important to note that SHAP is an approximation of Shapley value, with the main assumption of not having too much correlation between your features.

That being said, taking the mean instead of the mean absolute values seems to be the most efficient approach in the continuity of what you are doing. Just keep in mind that :

SHAP don't have a "physical" interpretation, in terms of direct impact on the output. It may lack some meaning for real life users.
Taking the mean can "hide" skewed SHAP profile : a variable with high impact on a small subgroup of instances and no impact on the rest may get the same average than a variable with small impact on every instance.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange