How Calculate Effect (percentage) label of the input variables on the output variable by BernoulliNB
-
12-12-2020 - |
質問
a description problem below. I have 10 words like X1 , X2 , X3 , ... , X10
and three Label like short , long , hold.
My problem is that how calculate Effect (percentage) label of the input variables
by BernoulliNB Algorithm.
NB = BernoulliNB()
NB.fit(X_train, y_train)
and how calculate Effect (percentage) label of the input variables
解決
A Naive Bayes model consists of the probabilities $P(X_i|Class)$ for every feature $X_i$ and every label $Class$. So by looking at the parameters of the model one can see how important a particular feature is for a particular class. the opposite could be calculated as well: $P(Class|X_i)$ represents the distribution of the classes given a feature.
Now at the level of individual instances it's not so clear what would be the "effect" of a particular feature: For every class the posterior probability is:
$$P(Class| X_1,..,X_n) = \frac{P(Class)\prod_i P(X_i|Class)}{P(X_1,..,X_n)}$$
You can easily order the features by how much they contribute to the prediction, i.e. the class which obtains the maximum posterior probability (for instance obtain the top 3 features). However you cannot quantify precisely the effect of each feature, because the prediction is not a linear combination of the features.
[Details added following comments]
Due to the NB assumption that features are independent, we have:
$P(Class|X_1,..,X_n) = \prod_i P(X_i|Class)$
$P(Class|X_1,..,X_n) = P(X_1|Class) * P(X_2|Class) * .. * P(X_n|Class)$
From the conditional definition:
$P(Class|X_1,..,X_n) = P(Class,X_1,..,X_n) / P(X_1,..,X_n)$
which gives:
$P(Class,X_1,..,X_n) = P(Class) * P(Class|X_1,..,X_n)$ $P(Class,X_1,..,X_n) = P(Class) * P(X_1|Class) * P(X_2|Class) * .. * P(X_n|Class)$
Now we use the marginal to calculate $P(X_1,..,X_n)$:
$P(X_1,..,X_n) = \sum_j P(Class_j,X_1,..,X_n)$ $P(X_1,..,X_n) = P(Class_1,X_1,..,X_n) + .. + P(Class_n,X_1,..,X_n)$
So at the end we have $P(Class,X_1,..,X_n)$ and $P(X_1,..,X_n)$, so we can calculate:
$P(Class|X_1,..,X_n) = P(Class,X_1,..,X_n) / P(X_1,..,X_n)$
Note that if you do all these steps you should obtain the same probability for $P(Class|X_1,..,X_n)$ as the one returned by the function predict_proba
.
Caution: the functions feature_log_prob_
and class_log_prior_
don't give you the probability directly, they give you the logarithm of the prob. So you need to apply exponential in order to get back the probability.