Question

I was wondering how Google geocodes and address? Does it work like a DNS lookup, where they have a big table of addresses which is a hash to a geocode, or is there any fun geometery that goes into it? If it is a big hash table how did they go about gathering all that data?

Was it helpful?

Solution

Busbina, I work for SmartyStreets where we verify and geocode street addresses -- so I'll tell you what I know, and link you to further sources for your own research.

To answer your question: It is both.

There are suppliers of massive databases (for example, those like TIGER Data) which contain relational, geo-political information including coordinates, streets, boundaries, and names. For US data, it is likely to obtain at least ZIP-level accuracy through tables like these simply by doing a lookup. For more accuracy, though, append the +4 code and you may narrow it down to a city block or floor of a high building.

To attempt further accuracy (ie. knowing where precisely on the street a building is located), Google and others perform what is called interpolation, where they take the known boundaries from their datasets and and the known range of primary numbers from the start of that block or street to the end of it, and they solve a ratio. If the correct primary number is known, and for straight streets in an ideal setting, a simple ratio like this works:

(primary number - starting primary number) / (ending primary number) =
        (x - starting boundary coordinate) / (ending boundary coordinate)

Where x is a close guess to the actual location on the street - but only a guess. Accurate building-level data can be very expensive and I think is only available for some urban areas.

The key is to get the right primary number and accurate, up-to-date data. Maintaining this can be time-consuming and expensive because of all the overhead involved with so much information.

Note that Google and similar map services only perform address approximation, not address verification, and thus are liable to make mistakes (even if the geocoding algorithm is very precise) because the primary number may be wrong or may not even exist. So when that matters to you (or you aren't showing a Google Map and must honor the Terms of Service), something like LiveAddress, as a starting point, is certified by the USPS and won't return bad addresses.

So there are some things to consider.

More information:

** I'll add a note, since I have had this question a lot: rooftop- or building-level accuracy is very expensive information. I know of very few providers who offer this, and they have mined and collected that data themselves. For example, Google has the Street View project, from which they've obtained accurate coordinates for approximate addresses, and they can provide such precision. But most geocoders use the same data from official sources, they just interpolate differently. If you want extremely precise coordinates like building-level, you can expect to pay mightily for it, or go collect the data yourself. (Yes, Google's is free to a point -- unless you intend to use the information for more than just showing a map, basically.)

OTHER TIPS

Another service that is very similar is GeoNames which is a US Government run database of location names. This service is better tailored towards points of interest, like an airport or landmark. This is just a database of names, locations, and some meta data.

http://www.geonames.org/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top