Pregunta

I'm working with billions of rows of data, and each row has an associated start latitude/longitude, and end latitude/longitude. I need to calculate the distance between each start/end point - but it is taking an extremely long time. I really need to make what I'm doing more efficient.

Currently I use a function (below) to calculate the hypotenuse between points. Is there some way to make this more efficient? I should say that I have already tried casting the lat/longs as spatial geographies and using SQL built in STDistance() functions (not indexed), but this was even slower.

Any help would be much appreciated. I'm hoping there is some way to speed up the function, even if it degrades accuracy a little (nearest 100m is probably ok). Thanks in advance!

DECLARE @l_distance_m float
, @l_long_start FLOAT
, @l_long_end FLOAT
, @l_lat_start FLOAT
, @l_lat_end FLOAT
, @l_x_diff FLOAT
, @l_y_diff FLOAT

SET @l_lat_start = @lat_start 
SET @l_long_start = @long_start
SET @l_lat_end = @lat_end
SET @l_long_end = @long_end 
-- NOTE 2 x PI() x (radius of earth) / 360 = 111 
SET @l_y_diff = 111 * (@l_lat_end - @l_lat_start)
SET @l_x_diff = 111 * (@l_long_end - @l_long_start) * COS(RADIANS((@l_lat_end + @l_lat_start) / 2))
SET @l_distance_m = 1000 * SQRT(@l_x_diff * @l_x_diff + @l_y_diff * @l_y_diff)
RETURN @l_distance_m
¿Fue útil?

Solución

I haven't done any SQL programming since around 1994, however I'd make the following observations:

  1. The formula that you're using is a formula that works as long as the distances between your coordinates doesn't get too big. It'll have big errors for working out the distance between e.g. New York and Singapore, but for working out the distance between New York and Boston it should be fine to within 100m.
  2. I don't think there's any approximation formula that would be faster, however I can see some minor implementation improvements that might speed it up such as (1) why do you bother to assign @l_lat_start from @lat_start, can't you just use @lat_start directly (and same for @long_start, @lat_end, @long_end), (2) Instead of having 111 in the formulas for @l_y_diff and @l_x_diff, you could get rid of it there hence saving a multiplication, and instead of 1000 in the formula for @l_distance_m you could have 111000, (3) using COS(RADIANS(@l_lat_end)) or COS(RADIANS(@l_lat_start)) won't degrade the accuracy as long as the points aren't too far away, or if the points are all within the same city you could just work out the cosine of any point in the city
Apart from that, I think you'd need to look at other ideas such as creating a table with the results, and whenever points are added/deleted from the table, updating the results table at that time.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top