Compressing normally distributed data

https://cs.stackexchange.com/questions/20156

30-10-2019
|

Question

Given normally distributed integers with a mean of 0 and a standard deviation $\sigma$ around 1000, how do I compress those numbers (almost) perfectly? Given the entropy of the Gaussian distribution, it should be possible to store any value $x$ using $$\frac{1}{2} \mathrm{log}_2(2\pi\sigma^2)+\frac{x^2}{2\sigma^2}\rm{log}_2\rm{e} \;\text{ bits.}$$
Arithmetic coding should provide perfect compression. In principle it's not too hard. I can calculate the interval boundaries from the cumulative distribution function of the Gaussian. In practice, I hit considerable difficulties because when using floating point operations I cannot achieve perfect reproduction of the results, and I have no idea how to do this without FP operations. Perfect reproduction is necessary because the uncompressing code must come up with exactly the same interval boundaries as the compressing code.

So my question is: How do I compute the interval boundaries? Or is there any other way to achieve (near) perfect compression of such data?

Strictly speaking, the normal distribution is defined only for continuous variables. So what I mean here are integers x with a probability distribution function $$P(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\rm e^{-\frac{x^2}{2\sigma^2}}.$$
This distribution does not sum up exactly to 1. However, for $\sigma>100$ the difference from 1 is less than $10^{-1000}$ and hence it's more precise than any practical calculation would be.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange