Question

Suppose through $\ell_1$ minimization I obtained two sparse probability distributions $P, Q$ which may contain many zero terms. Then I would like to compute the KL-Divergence of them $D(P || Q) = \sum_i {p_i\log(\frac{p_i}{q_i})}$. However, since the probability distribution is sparse, it might occur that $p_i \not= 0$ and $q_i = 0$. In that case, KL-Divergence is not well defined. One solution is to incorporate Dirichlet Prior. However, I am afraid by doing so the sparsity of the probability distributions is violated. Is there any other way to compute the KL-Divergence of the two probability distributions?

Was it helpful?

Solution

Have you looked into Jensen-Shannon Divergence? It avoids the situation you described by adding a midpoint. Formally we have, $$\mathrm{JSD}(P||Q) = \frac{1}{2} \mathrm{KL}(P||M) + \frac{1}{2} \mathrm{KL}(Q||M)$$ where $M = \frac{1}{2}(P + Q)$. Informally, JSD is nice when you want to compare distributions that seem to have the same shape, but dont completely overlap.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top