Why do we add “αd” to N in Laplace Smoothing?

https://datascience.stackexchange.com/questions/76844

12-12-2020
|

質問

I just started to learn Naive Bayes algorithm. Then I learned to use Laplace smoothing to avoid getting probability of zero. I understand the purpose of using it, but, in the expression of Laplace smoothing below, I do not really understand why we need to add "αd" to N in the denominator. As far as I researched, one of websites says that it is done so that the division will never be greater than 1. However, I still do not understand why we add "αd". Could someone explain why we have to add "αd" to N in the denominator ?

解決

You're right, the idea is that, if $\alpha d$ is not used, then you might have $\hat{\theta_i}$ bigger than 1.

That being said, I don't think the main reason of Laplace smoothing is to have probabilities greater than 0. I think the important thing is that it acts as a regularization technique (a smoother, indeed).

The smoother works the following way: we are mixing the empirical observations ($\frac{x_i}{N}$) and the theoretical distribution without data ($\frac{1}{d}$), and $\alpha$ is a parameter that controls how much do we care about the theoretical distribution. The idea is: the more we trust the theoretical distribution, the more regularization we are adding. It even has a Bayesian interpretation in terms of priors and posteriors.

So I don't think it has to do with probabilities being between 0 and 1, I think it has to do with performing the mixture of distributions (or update of the posterior, if you want) properly.

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange