Question

I have a large Oracle database ( 720,000 records aprox) where each record has its own geographic coordinates (lat & lng) and i need to select just the records that are in a specific distance from a point ( inside a specific radius).

Currently i've implemented a distance function (based on haversine) that i've found in an oracle forum but because the database is a bit big it spends about 50 seconds per select.

Any recomendations on how to do thi efficiently?. I know there is an extension called oracle spatial & locator but i don´t know if i can buy it or even how does it work. Thanks a lot in advance. Best regards

Was it helpful?

Solution

Use a better algorithm. Instead of calculating actual Euclidian distance, which requires a square-root calculation, do your select on the linear distance that requires only subtraction and addition. I.e. if your point is at (10, 10) and your radius is 5, select all places with points inside the square formed by (10 +/- 5, 10 +/- 5).

This will catch a small number of false positives in the corners of the square. Eliminate these by double-checking the results in your application by calculating the proper Euclidian distance.

OTHER TIPS

Do provide more details about the specific format of the Lat and Long values, as well as the specific formula used for implementing haversine.

There are three approaches which can speed up things. Depending on the situation we can do at least two of these.

  1. Weed-out as many records as possible by a simple attribute value comparaison.
    For these records, we don't need to calculate anything at all.
    For example, convert the maximum radius requirement to a [generous but approximate] range of the Longitude (and possibly latitude) values which would qualify

  2. Use an alternative (possibly approximative) distance measurement.
    For example, it may be faster to calculate the square of the eucldidian distance, based on a rounded-up coordinates. (And of course to compare this with the square of desired radius)

  3. Improve the way the haversine formula is implemented.

A couple of suggestions, if you aren't already doing them...

  1. Since the Haversine calculation requires the angles in radians, if you are storing latitude and longitude in degrees, add a couple of columns and precompute the radian equivalents. More generally, pre-compute any of the values in the function that you can for the formula and store them.

  2. Consider using a simpler function to eliminate points that are well outside the radius, running the Haversine function only on those that are potential matches based on the simpler function. For degrees you could use SQRT( (69.1*dLat)2 + (53*dLong)2) ) and use some fudge factor (10%). Run your Haversine calculation only on the points that match the cruder approximation if you need better than what the simpler calculation provides.

If you have the license then Oracle Spatial might be of use

Oracle Docs - Oracle Spatial

I've not used it but a quick scan of the docs would point to the function SDO_WITHIN_DISTANCE

Is the "specific distance" somewhat constant? IE are you always searching for "all points within 1 mile" or does the radius change?

What percentage of the total records do you expect to get back in any given query? 10%? .10%?

If you will always have the same radius, build a grid of squares with the same length as the radius. Assign each a list of neighboring squares. Each point will know what square it is in, from which you can get a list of all the neighboring squares. Then run the calculation on only the points in those squares. This is similar to the other answers that have popped up, but will be quicker because the linear calculations are approximated in an indexed lookup rather than calculated between every point.

Even with a variable radius, you can still use the above, but you'll have to calculate how many 'neighbors' to include. These are only feasible if you're expecting to get a small subset of the total from any individual query.

If you don't need the distance to be too accurate, you can just treat the earth as flat. From this discussion:

Approximate distance in miles:

sqrt(x * x + y * y)

where x = 69.1 * (lat2 - lat1) and y = 53.0 * (lon2 - lon1)

I recently did some optimizing for mysql (outlined here: www.mooreds.com/wordpress/archives/000547 [sorry, I only get 1 hyperlink per post] ) but am not sure how many of the steps I went through are applicable to Oracle. Some definitely are (like using a bounding box if possible).

Approximate distance in miles:

sqrt(x * x + y * y) 
where x = 69.1 * (lat2 - lat1) and y = 53.0 * (lon2 - lon1)

You can get a much more accurate result... if you change the 53.0 magic number... to also take into account the change in latitude. (Get progressively smaller as you move toward the poles.)

Does anyone have that magic-magic formula?

First of all, Haversine is not perfect, because Earth is not a perfect sphere - read http://www.movable-type.co.uk/scripts/latlong-vincenty.html

Second - PL/SQL is not a perfect tool to program calculations with many lines of code which will be called many times. If you go with Java or C++ implementing your math, you will give huge performance improvement. C++ or Java code can be called from Oracle just like a function.

Third - people who commented that you need to cut out as many points as possible with simple rectangular boxing are very correct. Create an index by longitude and latitude columns, it will help to execute that boxing clause.

Lastly, I don't think Oracle Spatial has to be involved here - it is an overkill. If you already have it and created SDO_GEOMETRY column, this is one story, but if not - I would not consider it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top