Question

I have a list of 1 million (slowly) moving points on the globe (stored as latitude and longitude). Every now and then, each point requests a list of the 100 nearest other points (with a configurable max range, if that helps).

Unfortunately, SELECT * SORT BY compute_geodetic_distance() LIMIT 100 is too slow to be done by each point over and over again. So my question: how should I handle this efficiently? Are there better algorithms/datastructures/... known for this? Or is this the only way and should I look into distributing server load?

(Note: this is for an Android app and the points are users, so in case I'm missing an android-specific solution, feel free to say so!)

Was it helpful?

Solution

For your task geo spatial databases have been invented.
There is Oracle Spatial (expensive) and PostGres (free).
These databases store your millions points in a geographical index, a quad tree (Oracle). Such a query needs nearly no time.

Some people, like me prefer to leave the database away and build up the quadtree themselfs.

The operations search and insert are easy to implement. Update/delete can be more complex.(Cheapest related to implementation effort, is to build up a new quadtree evry minute)

Using a quadtree you can perform hundreds or thousansds of such nearest 100 points within a second.

OTHER TIPS

Architecturally I would arrange for each "point" to phone home to a server with their location when it changes more than a certain amount. On the server you can do the heavy lifting of calculating the distance between the point that moved and each of the other points, and for each of the other points updating their list of the 100 closest points if required. You can then push changes to a point's closest 100 list as they happen (trivial if you are using App Engine, Android push is supported).

This reduces the amount of work involved to an absolute minimum:

  • Only report a location change when a point moves far enough
  • Only recalculate distances when a report is received
  • Don't rebuild the closest 100 list for a point every time, build the list once, then work out if a point that has moved is going to be added or removed from every other point's list.
  • Only notify a point of changes to its top 100 list to preserve bandwidth.

There are algorithms that you can use to make this super-efficient, and the problem has a fork/join feel to it as well, allowing you to throw horsepower at the problem.

You have to divide the earth into zones and then use an interior point algorithm to figure out what zones the phone is in. Each possible subset of zones will uniquely determine the 100 closest nodes to a fair approximation. You can get an exact set of 100 nodes by checking distance one by one against the candidate nodes, which (once again) are determined by the subset of zones.

Instead of r-tree or a quadtree, I.e spatial index you can also use a quadkey and a monster curve. This curve reduce the dimension and completetly fills the space. You can download my php class hilbert curve from phpclasses.org. You can use a simple varchar column for the quadkey and search the levels from left to right. A good explanation is from Microsoft Bing maps quadkey website.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top