compact representation and delivery of point data

https://stackoverflow.com/questions/1260981

data-compression

12-09-2019
|

Question

I have an array of point data, the values of points are represented as x co-ordinate and y co-ordinate.

These points could be in the range of 500 upto 2000 points or more.

The data represents a motion path which could range from the simple to very complex and can also have cusps in it.

Can I represent this data as one spline or a collection of splines or some other format with very tight compression.

I have tried representing them as a collection of beziers but at best I am getting a saving of 40 %. For instance if I have an array of 500 points , that gives me 500 x and 500 y values so I have 1000 data pieces. I around 100 quadratic beziers from this. each bezier is represented as controlx, controly, anchorx, anchory. which gives me 100 x 4 = 400 pcs of data. So input = 1000pcs , output = 400pcs.

I would like to further tighen this, any suggestions?

Solution

By its nature, spline is an approximation. You can reduce the number of splines you use to reach a higher compression ratio.

You can also achieve lossless compression by using some kind of encoding scheme. I am just making this up as I am typing, using the range example in previous answer (1000 for x and 400 for y),

Each point only needs 19 bits (10 for x, 9 for y). You can use 3 bytes to represent a coordinate.
Use 2 byte to represent displacement up to +/- 63.
Use 1 byte to represent short displacement up to +/- 7 for x, +/- 3 for y.

To decode the sequence properly, you would need some prefix to identify the type of encoding. Let's say we use 110 for full point, 10 for displacement and 0 for short displacement.

The bit layout will look like this,

Coordinates:        110xxxxxxxxxxxyyyyyyyyyy
Dislacement:        10xxxxxxxyyyyyyy
Short Displacement: 0xxxxyyy

Unless your sequence is totally random, you can easily achieve high compression ratio with this scheme.

Let's see how it works using a short example.

3 points: A(500, 400), B(550, 380), C(545, 381)

Let's say you were using 2 byte for each coordinate. It will take 16 bytes to encode this without compression.

To encode the sequence using the compression scheme,

A is first point so full coordinate will be used. 3 bytes. B's displacement from A is (50, -20) and can be encoded as displacement. 2 bytes. C's displacement from B is (-5, 1) and it fits the range of short displacement 1 byte.

So you save 10 bytes out of 16 bytes. Real compression ratio is totally depending on the data pattern. It works best on points forming a moving path. If the points are random, only 25% saving can be achieved.

OTHER TIPS

If for example you use 32-bit integers for point coords and there is range limit,
like x: 0..1000, y:0..400, you can pack (x, y) into a single 32-bit variable.

That way you achieve another 50% compression.

You could do a frequency analysis of the numbers you are trying to encode and use varying bit lengths to represent them, of course here I am vaguely describing Huffman coding

Firstly, only keep enough decimal points in your data that you actually need. Removing these would reduce your accuracy, but its a calculated loss. To do that, try converting your number to a string, locating the dot's position, and cutting of those many characters from the end. That could process faster than math, IMO. Lastly you can convert it back to a number.

 150.234636746  ->  "150.234636746"  ->  "150.23"  ->  150.23

Secondly, try storing your data relative to the last number ("relative values"). Basically subtract the last number from this one. Then later to "decompress" it you can keep an accumulator variable and add them up.

 A    A    A             A    R   R
 150, 200, 250     ->    150, 50, 50

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow