質問

a description problem below. I have 10 words like X1 , X2 , X3 , ... , X10
and three Label like short , long , hold.
My problem is that how calculate Effect (percentage) label of the input variables
by BernoulliNB Algorithm.

NB = BernoulliNB()
NB.fit(X_train, y_train)

and how calculate Effect (percentage) label of the input variables

役に立ちましたか?

解決

A Naive Bayes model consists of the probabilities $P(X_i|Class)$ for every feature $X_i$ and every label $Class$. So by looking at the parameters of the model one can see how important a particular feature is for a particular class. the opposite could be calculated as well: $P(Class|X_i)$ represents the distribution of the classes given a feature.

Now at the level of individual instances it's not so clear what would be the "effect" of a particular feature: For every class the posterior probability is:

$$P(Class| X_1,..,X_n) = \frac{P(Class)\prod_i P(X_i|Class)}{P(X_1,..,X_n)}$$

You can easily order the features by how much they contribute to the prediction, i.e. the class which obtains the maximum posterior probability (for instance obtain the top 3 features). However you cannot quantify precisely the effect of each feature, because the prediction is not a linear combination of the features.


[Details added following comments]

Due to the NB assumption that features are independent, we have:

$P(Class|X_1,..,X_n) = \prod_i P(X_i|Class)$

$P(Class|X_1,..,X_n) = P(X_1|Class) * P(X_2|Class) * .. * P(X_n|Class)$

From the conditional definition:

$P(Class|X_1,..,X_n) = P(Class,X_1,..,X_n) / P(X_1,..,X_n)$

which gives:

$P(Class,X_1,..,X_n) = P(Class) * P(Class|X_1,..,X_n)$ $P(Class,X_1,..,X_n) = P(Class) * P(X_1|Class) * P(X_2|Class) * .. * P(X_n|Class)$

Now we use the marginal to calculate $P(X_1,..,X_n)$:

$P(X_1,..,X_n) = \sum_j P(Class_j,X_1,..,X_n)$ $P(X_1,..,X_n) = P(Class_1,X_1,..,X_n) + .. + P(Class_n,X_1,..,X_n)$

So at the end we have $P(Class,X_1,..,X_n)$ and $P(X_1,..,X_n)$, so we can calculate:

$P(Class|X_1,..,X_n) = P(Class,X_1,..,X_n) / P(X_1,..,X_n)$

Note that if you do all these steps you should obtain the same probability for $P(Class|X_1,..,X_n)$ as the one returned by the function predict_proba.

Caution: the functions feature_log_prob_ and class_log_prior_ don't give you the probability directly, they give you the logarithm of the prob. So you need to apply exponential in order to get back the probability.

ライセンス: CC-BY-SA帰属
所属していません datascience.stackexchange
scroll top