Question

I have a database of geocoded entries. I need to determine which two entries are the furthest apart from a subset of the total entries. For example, I select a list of 10 entries then, from that list, determine which two places represent the greatest distance within that list.

I cannot wrap my head around how to approach this. I've considered using radians even, but nothing seems to meet the requirement.

FYI, LAMP stack going here...

Was it helpful?

Solution

The following query will calculate the distance between all your points and return the two with the biggest distance:

SELECT coor1.longitude as lon1,
       coor1.latitude as lat1,
       coor2.longitude as lon2,
       coor2.latitude as lat2,
       (ACOS(
         COS(RADIANS(coor1.latitude))  * 
         COS(RADIANS(coor1.longitude)) *
         COS(RADIANS(coor2.latitude))  *
         COS(RADIANS(coor2.longitude)) + 
         COS(RADIANS(coor1.latitude))  *
         SIN(RADIANS(coor1.longitude)) *
         COS(RADIANS(coor2.latitude))  *
         SIN(RADIANS(coor2.longitude)) +
         SIN(RADIANS(coor1.latitude))  * 
         SIN(RADIANS(coor2.latitude))
         ) * 6378                        --- Use 3963.1 for miles
       ) 
AS DistanceKM
FROM coordinates coor1,
     coordinates coor2
WHERE NOT (coor1.longitude = coor2.longitude AND coor1.latitude = coor2.latitude)
ORDER BY DistanceKM DESC
LIMIT 1;                                 --- Only the biggest

Now I recommending doing those calculations before hand and storing the result in a separate table.

OTHER TIPS

By the looks of it, this could be solved by first finding the convex hull of the points (using Graham's scan, for instance), and then doing rotating calipers for the diameter on that.

Brute force approach:

  1. Find the center of your list of ten by averaging the latitude and longitude values.

  2. For each (latitude,longitude) pair in your database, use the great circle formula to calculate distance from the center from step (1)

  3. Pick greatest two distances.

Obvious optimization: break the world in to N "squares" (e.g., 10 degrees longitude, 10 degrees latitude) and pre-compute the great circle distance between the centers of of each pairing. Store this in the database. Now you can quickly look for the furthest away "squares" and only check (latitude,longitude) pairs inside those tiles.

Here is the algorithm implemented in PHP for the distance between two points based on latitude and longitude.

Note that if the "subset of total entries" is large, you quickly have quite a few computations to do. If that's the case, you may want to consider pre-calculating distances between city pairs.

EDIT: Why 10 degree optimization does not work:

Take four squares as shown below

-------------------
|        |        |
|   A    |   B    |
|        |        |
|_______1|________|
|        |2       |
|   C    |   D    |
|        |        |
|_______3|________|

By only measuring the centers of the squares and comparing those distances, you get A and D are further apart than A and C. However, cities 1 and 3 are clearly further apart than 1 and 2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top