Question

My data are not actually totally random. I am looking to compress telemetry measurements which will tend to be in the same range (e.g. temperatures won't vary much). However, I seek a solution for multiple applications, so I might be sending temperatures one day, voltages the next, etc.

I want to send measurements over a low data rate satellite link. SatCom is reasonably expensive, so I would like to shave off every cent that I can. I don't mind expending computing resources to pack & unpack the data as nothing is too time critical (it can take up to 30 seconds to transmit 192 bytes).

Can anyone advise of a FOSS data compression method which will give me the most compression on telemetry data?

Is it ever worth trying? What sort of percentage gains can I expect?

I do apologize that I cannot be more precise about the nature of the data - just general telemetry measurements like temperatures, lat/long GPS position, flow rate of liquids, etc.

Was it helpful?

Solution

Truly random data is not compressible.

Since you can't disclose the details of your data, the best thing for you to do is to test a few different compression algorithms on some sample data.

A good place to start is the DEFLATE algorithm, which is the industry-standard combination of LZ77 sliding-window compression and Huffman coding. It is implemented by many specific compression packages, GZIP being a prominent example.

OTHER TIPS

Perhaps the best solution would be to use a DEFLATE library and run it on large blocks of data and with high compression settings.

If you want to roll your own stream compression algorithm, you can apply the same algorithm that works for sound files: Send the first measurement directly, then encode the difference between each sample and the previous one (delta-encoding).

Now, the best encoding differs based on how fast the data is changing:

If the data is changing fast enough, use an adaptive Huffmann tree. If the differences are uncorrelated (data + noise), this will get you at most one bit per sample from the entropy.

If several consecutive data samples are likely to be equal to each other (the data is not changing very fast and there's no HF noise), then encode each non-zero difference using one Huffmann tree, and the number of zeroes using a second Huffmann tree. This will get you at most two bits per run.

You can even encode the differences as one bit only (up or down), but then you need to be able to encode zero-length runs.


My suggestion: delta-encode once or twice to get uncorrelated entries, then DEFLATE using a library.

The perfect algorithm for this use case is Quantile Compression. It is implemented in rust as FOSS under the name q-compress.

For instance, say you want to losslessly transfer some 32-bit floating point data like temperature, and they are all Kelvin uniformly distributed in the range 275-300.

LZ77/78 algorithms require roughly 24 bits per number, since they look at a byte-level, and would miss most of the compressible information in the mantissa bits. You can see an example in the linked blog post.

Quantile Compression would use about 20 bits per number (the actual entropy of the distribution; you need about 20 of the 23 mantissa bits with exponent 8 corresponding to range 256-512 to encode these Kelvins).

You can discretize to do even better. 32-bit floats will give you 15 bits of precision past the decimal place in this example. If you only care about precision to the mK, you can instead transfer integer values of mK. These would be between 275000 and 300000, and Quantile Compression can shrink these to about 15 bits per number. If you only care about precision to the dK (1/10) of a Kelvin, you can shrink down to 8 bits per number.

And if your numbers vary slowly (as in, adjacent values are similar), you can do Delta encoding in between discretizing and Quantile Compression. Depending on how slowly your data changes, this could conceivably use 2-8 bits per number at dK precision.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top