Postal code (ZIP) worldwide (not just US) optimized data structure (not SQL, CSV or Google API) for long and lat retrieval

StackOverflow https://stackoverflow.com/questions/8845839

Frage

Does any one know of database structure such as this http://www.maxmind.com/app/geolitecity that is optimized for super fast retrieval of long and lat based on either ZIP or (City, State, Country) parameters?

Maxmind's database does not support any other retrieval than IP retrieval, at least not to mine knowledge. So if you know how to do it preferably in Java, I'm all ears.

This should not be SQL type database or CSV file or Google API solution. Thous are just to slow. Especially if you want to offer search results sorted by distance.

Paid solutions are also option. The data structure doesn't have to be free.

War es hilfreich?

Lösung

I don't believe there is such a thing as a "fast" way to do this. I've built a geocoding API for Canadian postal codes and the way we search is to have two indexes of postal codes - one sorted by lattitude and one sorted by longitude. You can do some spherical geometry and develop a bounding "box" that fits everything in a given radius but you still have to go back and do a point to point distance measurement using Vincenty or Haversine or your algorithm of choice for the distance between your origin and each postal code you find.

With a world-wide database, your math gets complicated by the fact that you can cross meridians and the equator.

You'll want some kind of encoding scheme that lets you work in radians, since that is what most distance calculation hueristics require.

Andere Tipps

this can be done very quickly with any database engine that supports two dimensional indexes... and mysql supports unlimited dimensions as well as I know... it's simple.. you use a 2-d index to limit your result set to a reasonable size extremely quickly... then you examine your result set with a high precision calculation algorithm if you need to.. not hard.. except you may need to or two lists together if they cross the 180/-180 longitude line making a 2d index is simple.... index (latitude,longitude) ... that index only works on latitude or latitude,longitude pairs... it won't work on longitude alone... if you want an additional index for longitude index (longitude) .... I select out a rough estimate square and round the corners if I care about them. ...

if you have a zip or city to start with... zip codes are just a 1-d index... no problem making that happen fast.. just use an index index(zip) ... and if your hard drive is too slow, get a solid state drive to eliminate the seek times.. or use a huge ram and cache the whole table... this is not a hard problem either way you want to go

if that's not fast enough for you, using someones service won't help because you have network overhead... you will have to hold your data directly in ram/ssd and build your own 2-d /1-d indexing system if you need it (not hard)... that route could probably beat sql by a factor of 10 or so because the sql engine has a lot of overhead.... I suppose someone might offer a service that runs on your own machine, but realistically, that wouldn't beat sql by very far because you still have to go through a bunch of hoopdiloops to make the request to their service. sql and 2-d indexes with a solid state drive will be damned fast you shouldn't need to process the data yourself unless you are the post office, sorting 10,000 pieces of mail per second with one machine serving the data. then you'll have to write your own data management routines.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top