Question

The collision entropy is defined as the Renyi entropy for the case $\alpha = 2$. It is given by

$$\mathrm{H}_{2}(X)=-\log \sum_{i=1}^{n} p_{i}^{2} \tag{1}$$

Take two random variables $X$ and $X'$ which follow the same probability distribution. The probability of a collision is then simply $P_{\rm coll} = \sum_{i=1}^{n} p_{i}^{2}$. I would then expect that we say the collision entropy is just $H(P_{\rm coll})$ i.e.

$$-\left(\sum_{i=1}^{n} p_{i}^{2}\right)\log\left(\sum_{i=1}^{n} p_{i}^{2}\right) - \left(1 -\sum_{i=1}^{n} p_{i}^{2}\right)\log\left(1-\sum_{i=1}^{n} p_{i}^{2}\right)$$

This is in analogy with the binary entropy but with the probability replaced with the probability of a collision.

What is the motivation behind choosing $(1)$ to be the definition of collision entropy?

Was it helpful?

Solution

When $X$ is distributed uniformly over a domain of size $n$, then Shannon entropy, collision entropy and min-entropy are all equal to $\log n$. In a sense, all of these parameters measure the amount of uncertainty in $X$.

In contrast, your proposed definition is always between $0$ and $1$, tending to zero as $X$ gets more unpredictable. This is quite different from other notions of entropy, in which zero stands for predictable.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top