Question

What is the best, most efficient way to test city detection? I have IP-based location detection implemented via www.maxmind.com, but now I'd like to test it's accuracy.

I know there are various proxy services out there such as https://www.geoedge.com/ and various similar websites, but most of these services have a very limited number of proxy servers. It would be great to have an automated solution which could iterate through hundreds if not thousands of proxy servers, hit a test page, and tabulate the results. I'm sure there are others who have had to deal with the same challenge. What is the defacto way to test this? For example, is cURL'ing to spoof IP addresses a possibility?

Note: many people have suggested that you can never achieve perfect accuracy when it comes to city detection due to the lack of reliability of IP addresses, and I am aware of this (http://www.maxmind.com/en/city_accuracy). I'd still like a way of testing for sanity / maintenance purposes. Thanks!

Related: How do sites like Groupon segment geolocation based on the cities they have deals in?

Was it helpful?

Solution

I split this answer into two sections for the sake of clarity.

IP Geolocation

You may want to stick with MaxMind unless you have a very good reason to question the MaxMind data. I built a very similar service to the one you are describing a few years ago and, like you, wanted a way to verify MaxMind's accuracy. I evaluated 10+ IP geolocation solutions running the entire gamut; free JSON APIs to enterprise-centric, database subscriptions. It became apparent rather quickly that most of the platforms were either using MaxMind directly or combining MaxMind data with metadata from other sources. The spelling, capitalization, and common abbreviations of ISP metadata

This paper, despite being a few years old, is also quite telling. The authors ascertain the accuracy of a handful of IP geolocation tools (including MaxMind) by comparing their results to a dataset they refer to as "ISP Groundtruth", a mashup of EU ISP router data and the actual GPS coordinates of the routers. The paper puts forth a technical explanation of inaccurate geolocation data at the city level.


Proxy Scanning

With respect to automated proxy scanning, I highly recommend checking out nmap and its Lua-based scripting engine (NSE). Here are a few scripts and libraries you may find useful:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top