Question

I have been using the matplotlib.mlab.entropy function and have noticed a possible error in the mlab code.

The documentation states that the calculation is:

Entropy formula
(source: matplotlib.org)
.

Note the log to base two. This is correct according to the usual definition of entropy.

However, in the mlab.py source, the calculation is using the natural logarithm:

S = -1.0 * np.sum(p * np.log(p)) + np.log(delta)

Surely that should be np.log2()?

I have tried the calculation myself using a couple of other methods (this, for example). I have copied and modified the mlab function and made it consistent with the others by changing np.log to np.log2.

So it looks to me that matplotlib.mlab.entropy is incorrect. Or am I missing something?

Was it helpful?

Solution 2

The documentation is incorrect, as confirmed by @user333700.

Following advice on the matplotlib-users mailing list, I have submitted a pull request to fix the documentation.

OTHER TIPS

There is no "correct" base.

Quoting Wikipedia: "Entropy is typically measured in bits, nats, or bans." and more about the base at http://en.wikipedia.org/wiki/Entropy_(information_theory)#Definition

From my experience: entropy in statistics and econometrics is almost always defined with the natural logarithm, entropy in signal processing is usually defined with base 2.

The corresponding functions in scipy.stats have recently obtained a base keyword option to switch to base 2 from the base e default.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top