Most efficient lossless compression for random numerical data?

Question 1

Truly random data is not compressible.

Since you can't disclose the details of your data, the best thing for you to do is to test a few different compression algorithms on some sample data.

A good place to start is the DEFLATE algorithm, which is the industry-standard combination of LZ77 sliding-window compression and Huffman coding. It is implemented by many specific compression packages, GZIP being a prominent example.

Question 2

Perhaps the best solution would be to use a DEFLATE library and run it on large blocks of data and with high compression settings.

If you want to roll your own stream compression algorithm, you can apply the same algorithm that works for sound files: Send the first measurement directly, then encode the difference between each sample and the previous one (delta-encoding).

Now, the best encoding differs based on how fast the data is changing:

If the data is changing fast enough, use an adaptive Huffmann tree. If the differences are uncorrelated (data + noise), this will get you at most one bit per sample from the entropy.

If several consecutive data samples are likely to be equal to each other (the data is not changing very fast and there's no HF noise), then encode each non-zero difference using one Huffmann tree, and the number of zeroes using a second Huffmann tree. This will get you at most two bits per run.

You can even encode the differences as one bit only (up or down), but then you need to be able to encode zero-length runs.

My suggestion: delta-encode once or twice to get uncorrelated entries, then DEFLATE using a library.

Question 3

The perfect algorithm for this use case is Quantile Compression. It is implemented in rust as FOSS under the name q-compress.

For instance, say you want to losslessly transfer some 32-bit floating point data like temperature, and they are all Kelvin uniformly distributed in the range 275-300.

LZ77/78 algorithms require roughly 24 bits per number, since they look at a byte-level, and would miss most of the compressible information in the mantissa bits. You can see an example in the linked blog post.

Quantile Compression would use about 20 bits per number (the actual entropy of the distribution; you need about 20 of the 23 mantissa bits with exponent 8 corresponding to range 256-512 to encode these Kelvins).

You can discretize to do even better. 32-bit floats will give you 15 bits of precision past the decimal place in this example. If you only care about precision to the mK, you can instead transfer integer values of mK. These would be between 275000 and 300000, and Quantile Compression can shrink these to about 15 bits per number. If you only care about precision to the dK (1/10) of a Kelvin, you can shrink down to 8 bits per number.

And if your numbers vary slowly (as in, adjacent values are similar), you can do Delta encoding in between discretizing and Quantile Compression. Depending on how slowly your data changes, this could conceivably use 2-8 bits per number at dK precision.