Here's what I'd considered:
1) Geocode the address on input, store the lat/long. When the user does a search, geocode the address and compare lat/longs to see if I have that exact lat/long in my database.
But there are problems with this.
- Storing the results of the Google Geocoder is a violation of their terms of use.
- There's a good reason for that; Google constantly updates their geocodes, so a given address's lat/long may change over time.
- I'd be performing an exact comparison on floating point numbers, which may not be accurate.
- What about multiple apartments within a building? They'll all have the same lat/long, but they're different addresses.
2) Geocode the address on input, but don't store the lat/long; store the address components, and compare those.
This seems better, but there are still problems:
- Still violates Geocoder terms of use?
- ... because Google might change its results. Maybe the address components are less likely to change, but they could still change as people report data errors to Google. (Certainly at least the zip code could change.)
3) Geocode the address, store the lat/long, but don't search for the lat/long exactly. Search within a small radius around the resulting point, looking for possible matches. Compare those possible matches by address components.
This might be the best answer, except that it still violates Google's Geocoder terms of use.
4) Geocode the address on input, get the address components, but just use them to store a parsed normalized postal address in the database.
Add some hand-rolled code to split normalized addresses into even smaller fields (street name, street type, prefix, postfix ...) When the user runs the search, run the same normalization code, then search by fields.
I guess this would work, but rolling my own address parser seems like a recipe for pain. It seems like it just can't possibly be right. (I can't be the first person to need to solve this problem, can I?)