Question

I have a query that I am using to find locations that are within 1km of a known point.

To do this, I am using the Spherical Law of Cosines formula with my latitude and longitudes.

Currently, the query runs over roughly 5000 records in about 5 minutes. This is acceptable, but I would like to make it as fast as possible.

The main query I am using:

select loc.*, base.key
from locations loc left outer join baseline base
on base.key in
(select key
from baseline
where [dbo].CIRCLEDISTANCE(loc.LAT, base.LATITUDE, loc.LONG, base.LONGITUDE) <= 1)

The Circle Distance scalar value function:

ALTER FUNCTION CIRCLEDISTANCE 
(
    -- Add the parameters for the function here
    @LAT1 varchar(250), 
    @LAT2 varchar(250),
    @LNG1 varchar(250), 
    @LNG2 varchar(250)
)
RETURNS float
AS
BEGIN

    DECLARE @Distance float
    DECLARE @LAT1_FLOAT float
    DECLARE @LNG1_FLOAT float
    DECLARE @LAT2_FLOAT float
    DECLARE @LNG2_FLOAT float

    select @LAT1_FLOAT = cast(@LAT1 as float)
    select @LNG1_FLOAT = cast(@LNG1 as float)
    select @LAT2_FLOAT = cast(@LAT2  as float)
    select @LNG2_FLOAT = cast(@LNG2 as float)

    SELECT @Distance = acos(sin(radians(@LAT1_FLOAT))*
            sin(radians(@LAT2_FLOAT))+
            cos(radians(@LAT1_FLOAT))*
            cos(radians(@LAT2_FLOAT))*
            cos(radians(@LNG2_FLOAT)-radians(@LNG1_FLOAT ))
            )*6371;

    RETURN @Distance

END
GO

The execution plan (with some anonymizing)

I have indexed all columns named in both query and function.

Some caveats:

  • I am converting to float in the function because the data I am getting arrives as a string and I am unable to change it prior to this point
  • The left outer join is required because I need to know locations that do not have a known point
  • I am using the Law of Cosines rather than Haversine because of it's simplicity and I don't require a large amount of accuracy, since I am only interested in things within 1 km.!

I am not a DBA myself, merely a .net developer, so I am not really knowledgeable about improving query performance beyond indexing, so any help is appreciated.

EDIT:

Rolling in Rob and Mark's answers allowed my to get my query execution down to 30 seconds. In case there's any further gains to be had, here is the query now:

select <columns>
from locations loc 
left outer join baseline base  
on base.key in
(select key
from baseline
where base.location_geo.STDistance(loc.location_geo) <= 0.00015678559)

The 0.00015678559 is 1000/6378137 (Earth radius) so I don't need to do any math when the query is run.

The location_geo is a computed column using the formula:

([geography]::STGeomFromText(((('POINT('+[Longitude])+' ')+[Latitude])+')',(104001)))

104001 is the Spatial ID for a spherical Earth model, since I'm not looking for a large distance this seems to make the query quicker to return.

I also have a Spatial index on the location_geo column as suggested here: http://social.technet.microsoft.com/wiki/contents/articles/9694.tuning-spatial-point-data-queries-in-sql-server-2012.aspx

Was it helpful?

Solution

Create a computed column which converts your lat and long columns into a geography type using the Point constructor. Then put a spatial index on this computed column.

Then your query can create a geography point from your circle centre, and compare distances. Should be very quick.

OTHER TIPS

Scalar valued functions takes time to invoke and by the looks of it you are doing quite a few calls to CIRCLEDISTANCE. You could rewrite your function to a inline table valued function instead.

create function CIRCLEDISTANCE 
(
  @LAT1 varchar(250), 
  @LAT2 varchar(250),
  @LNG1 varchar(250), 
  @LNG2 varchar(250)
)
returns table as return
(
  select acos(sin(radians(cast(@LAT1 as float)))*
         sin(radians(cast(@LAT2 as float)))+
         cos(radians(cast(@LAT1 as float)))*
         cos(radians(cast(@LAT2 as float)))*
         cos(radians(cast(@LNG2 as float))-radians(cast(@LNG1 as float)))
         )*6371 as Distance
)

That requires rewrite to your main query a bit because the function now returns a table instead of scalar value.. I'm not sure I grasp exactly what your query does but this could be a rewrite you can try.

select loc.*,
       B.[key]
from locations as loc
  outer apply (
              select base.[key]
              from baseline as base
                cross apply dbo.CIRCLEDISTANCE2(loc.LAT, base.LATITUDE, loc.LONG, base.LONGITUDE) as C
              where C.Distance <= 1
              group by base.[key]
              ) as B

To expand on @RobFarley's answer, by storing spatial data as Lat, Long AND GEOGRAPHY datatype, you can perform much faster distance calculations, without any of the nested cos/sin/radians functions.

There is a good example and tutorial on how to using the Spatial functions in SQL on MSSQLTips.

My code below is looking for all locations saved in my database within 5000Km of a given location (London). The location at 31.13463200, 29.97898900 is the Great Pyramids, some 4000Km away.

DECLARE @myLat DECIMAL(12,8), @myLng DECIMAL(12,8)
DECLARE @myLocation geography

-- Set Lat and Lng
SET @myLat = -0.12750 
SET @myLng = 51.50722
SET @myLocation = geography::STPointFromText('POINT(' + CAST(@myLng AS VARCHAR(20)) + ' ' + CAST(@myLat AS VARCHAR(20)) + ')', 4326);

SELECT 
  ID, 
  Lat, 
  Lng, 
  CAST(Location.STDistance(@myLocation)/1000 AS DECIMAL(12,2)) AS DistanceInKM
FROM dbo.GeoStats AS gs
WHERE Location.STDistance(@myLocation)/1000 < 5000; 

SQLFiddle

If you are looking for fast query performance, and you "don't require a large amount of accuracy" then perhaps don't try looking for things within a circle centered on your location, but look for them in a square centered on your location.

You didn't explain your use-case, but perhaps this might give a tolerable result. The calculation required is a simple subtraction/addition and so should execute very quickly.

I used this approach on a project a few years ago and query times became several orders of magnitude faster, but with slightly less accurate results.

The furthest away points you'd find using this approach would be in the corners of the square sqrt(2) miles away (about 1.4 miles). If this is too inaccurate, you could use this approach as a way to pre-filter your records, so you only do the slow calculation on matches in the one-mile box.

On the scale of a kilometer the difference between a geoid & a flat plane ought to be negligible. (ok, unless you are very close to the poles I suppose)

Couldn't you just use a planer geometry distance formula? i.e. x^2 + y^2 < a^2

That might be much cheaper than so many trig calls?

Another trick that comes to mind is using some sort of hash. e.g. If you imagine all points to be on a grid you could add two pre-computed fields to your table, (hash_lat & hash_long) that contain the integer parts & then you can pre-filter on (hash_lat +/- 1 & hash_long +/-1) before you do the more precise distance computation.

PS. You probably want a finer grid for the hash because you are looking at 1 kilometer circles.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top