Latitude/Longitude storage and compression in C

https://stackoverflow.com/questions/1220377

10-07-2019
|

Question

Does anyone know the most efficient representation for lat/long coordinates? Accuracy level should be enough for consumer GPS devices.

Most implementations seem to use double for each unit, but I'm suspicious that a float or fixed point format should be sufficient. I'd be curious to hear from anyone who has tried to compress and or store large arrays of these values.

EDIT:

In other words, whats the minimum accuracy required to represent lat/long for a consumer level device?

Solution

Personally I would use a 32 bit decimal fixed point representation, dividing by 1,000,000 as per Evan's answer and my comments.

However, if space is truly at a premium, here are some additional ideas:

You could use a 26 bit fixed point representation on the wire. This will require marshalling and unmarshalling the latitude and longitude into a large array of bytes, but will save you 12 bits for each location over the 32 bit value representation - almost a 19% saving, so it might well be worthwhile.
You could take advantage of the fact that longitude values need less precision as you get closer to the poles - they only need 26 bits worth at the equator. So you could write a scheme where the number of bits used to encode the longitude depends on the value of the latitude.
If your data has other compressible attributes - say, all the points are usually quite close together - you could take specific advantage of those, like using a delta coding scheme (where each point other than the first can be encoded as a delta from the last point).

OTHER TIPS

The circumference of the Earth is approx. 40.000 km or 24900 miles.

You need one-meter accuracy (3ft) to be able to out-resolve gps precision by an order of magnitude.

Therefore you need precisiton to store 40.000.000 different values. That's at minimum 26 bits of information. A 32 bit float or int will do well.

EDIT: added some points from comments, 32-bit values should be capable of offering enough precision.

I would use a 32-bit fixed point representation. If the values are:

42.915512,-99.521654 I would store the values * 100000 in int32_t's (they can be negative).

int32_t lat = 42915512;
int32_t lon = -99521654;

This is a good compromise between simple and accurate (5 decimal points is usually good enough, you could always bump it up to 1000000 to get 6 if needed).

To display to the user, do what caf suggests:

... to display to the user - use integer divide and modulo, eg printf("Lat = %d.%06d\n", lat / 1000000, abs(lat) % 1000000)

These will also be comparable/sortable in an efficient way since the relative ordering will be preserved.

EDIT: an additional benefit is that it can sent over a network or stored to disk in a binary format in a portable way.

Floats would be way more than sufficient for storing GPS coordinates, even if consumer-level GPS devices had anywhere near the accuracy claimed for them. If you don't believe this is true, try these two simple experiments:

Take two or more GPS devices out into one spot on a field somewhere, and jot down the coordinates measured by each device. Go back inside and plot the points from each device on a map (I think Google has something that does this for you). You'll be suprised at how far apart the points are (even though they're all supposed to be measuring the exact same spot).
Take your (allegedly) most accurate device, and place it somewhere where it can get a satellite fix but won't get rained on, and record a series of measurements taken over a couple of days. Plot all of the readings (as in #1). Again, you'll be surprised by how the points (which should all be the same or nearly the same) wander all over the map, sometimes by as much as a couple hundred feet.

I've been writing applications for GPS-enabled PDAs for years, and I've verified this for dubious customers over and over again (I've even won bets this way). There are higher-quality GPS devices out there that do achieve better accuracy than this, but the better accuracy is achieved with more expensive chipsets, and the devices are left in one spot for days or even weeks, with the readings averaged over time.

A four-byte float is far more accurate than the devices themselves. It would of course not hurt you at all to use a double instead, as long as the 2X factor isn't a problem for you.

23 bits of precision at 179 degrees of longitude gives under 10-meter accuracy, which is the best that ordinary GPS devices give. At the equator:

% gps distance "0.0, 179.0" "0.0, $((179 * (1 + 2**-23)))"
From 0.0, 179.0 to 0.0, 179.00002133846283 is 7.79 feet E
From 0.0, 179.0 to 0.0, 179.00002133846283 is 2.38 meters E

So an IEEE 754 single-precision floating-point number, known to your C compiler as float, wil be just adequate for representation. Beware of using floats for extended computations! Rounding error may eat your lunch. Consult a numerical analyst.

In Garmin's IMG map format they store coordinates inside bounding boxes using floats to set the edges of the boxes. Coords within the boxes are defined using a variable number of bits that are are just linear between min and max values depending on the precision needed.

For example: minlat=49.0, maxlat=50.0, minlon=122.0, maxlon=123.0, number of bits=16

So a value of:
32768,32768 would be converted to 49.5, 122.5
16384,0 would be 49.25, 122.0

If you need less precision, the same output could be generated with a number of bits=4
8,8 would be converted to 49.5, 122.5
4,0 would be 49.25, 122.0

If you are storing large arrays of these values, there are a few simple tricks if you do a delta compression, and store deltas, you can greatly reduce the size of a data stream. You can do deltas from a "key point"

K D D D D D D D D D D K D D D D ...

k + d get you to any d point

The deltas all reference the previous K, so to reconstruct any point, you need a K and a D

or you can do incrimental deltas

K I I I I I I I I I I K

This may take multiple sums to get to the desired position. but the data is smaller overall. SO to reconsturct

k+i+i+i to get to the 4th point

Finally you can combine both

K D I I I D I I I D I I I K

This is like mpeg-2 with IPB frames, but this way you are never more than 4 sums to any position, and you get the some of the benefit of Delta and Incrimental Compression.

You can pack both the latitude and longitude values in a single 32-bit integer with a resolution of at worst ~2.4 meters/pixel (at the equator) if you use a recursive tiling system. Using two bits per level, you can store 16 levels in 32 bits. You can get an idea of how that would work looking at this article about Virtual Earth's tiling system. This uses Mercator, so it would give you problems for the poles. You could instead use a different projection and still get very similar results.

This can also be used for a rough filter to find any points within a given parent tile since the first N bits will be the same (and so search becomes bit masking).

Assuming the earth is a perfect sphere (it is not, but close enough) with a radius ‘R’ of 3959 miles (or ×5280 ft/mi = 20903520 ft), the circumference is 131340690 feet (using 2×PI×R).

360 degrees of longitude covers 131340690 feet. 180 degrees of latitude covers 65670345 feet.

If you want to store lat/lng down to an accuracy of 3 feet, you need to be able to store 43780230 (131340690/3) longitude value and 21890115 (65670345/3) latitude values. 43780230 requires 25.38 bits (log(43780230)/log(2)) to store and 21890115 requires 24.38 bits (log(21890115)/log(2)) to store – or just under 50 bits (or 6.25 bytes).

So the obvious question becomes, if you want to store latitude and longitude in just 6 bytes, what will the accuracy be? Well, 6 bytes is 48 bits. That means 23.5 bits for latitude and 24.5 bits for longitude (longitude has twice as many values, which is just one bit and 24.5-23.5=1 bit). So 23.5 bits allows you to represent a number from 0 to 11863282 (11863283 values). And 65670345 feet divided by 11863283 values is 5.53 feet (and the same accuracy value for longitude).

THE BOTTOM LINE: So, if you can live with 5.5 feet of accuracy for both latitude and longitude, you can pack both values into just six bytes.

*A SIDE NOTE: Regarding comments that latitude and longitude are horrible for storing the positional information around a sphere (because there is less information to store at the poles) – well, those comments don’t hold up to the math! Let’s figure it out. Let’s say we want to design a new perfect system that can record and place a stake in the ground in the center of every square foot of earth. The surface area of earth (with a R of 3959 miles; formula for surface area of a sphere) is 5490965469267303 SQ FT – that many stakes requires 52.29 bits to represent. Now the existing latitude and longitude system uses a rectangular system. The width of the rectangle is the circumference of the earth and height of the rectangle is 1/2 the circumference) – which is 131340690 * 65670345 (see far above), or 8625188424838050 SQ FT – which requires 52.94 bits to represent (this system places ‘too many’ stakes in the ground around the poles). So, the shocking answer is that both the new perfect system, and the old lat/lng system, would BOTH require 53 actual bits to store a single location on earth, down to 1 foot accuracy!

I am suprised that no one has posted the fact that long/lat is a terrible way to store data on a sphere (someone did mention that longitude requires less precision near the poles).

Basically you can store data position as X and Y co-ords in meters. Imagine a cube around the earth that exactally fits (haha ok almost fits it). You only need store X and Y position, not all 3 co-ords, because the 3-rd co-ord can come from the redius of the earth, r = square root[x^2 + y^2 + z^2].

So convert your lat/long to x/y in meters. You will only need a total of 12756200m per co-ord (thats the diameters of the earth). So your total value will only have to span 0 to 25,512,400 (someone else claimed 40,000,000 because they were using long/lat) to be accurate to +/- 0.5m.

That will result in just 25 bits per position. If I were you I would just do accuracy to within 2m and use 24 bits per position, as that is a tidy 3 bytes.

Also if you are storing way-point information on a path, you can store each way-point as an offset from the last waypoint. Like start with a 24bit x/y co-ord. And then have a 16bit 'update' that adjusts the position by adding/subtracting x/y meters. 16bit would allow a waypoint update to be over 400m away. So if you know the device is not meant for planes and updates every so often, this might be acceptable too.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow