Question

The question is whether the query described below can be done without recourse to procedural logic, that is, can it be handled by SQL and a CTE and a windowing function alone? I'm using SQL Server 2012 but the question is not limited to that engine.

Suppose we have a national database of music teachers with 250,000 rows:

  teacherName, address, city, state, zipcode, geolocation, primaryInstrument

where the geolocation column is a geography::point datatype with optimally tesselated index.

User wants the five closest guitar teachers to his location. A query using a windowing function performs well enough if we pick some arbitrary distance cutoff, say 50 miles, so that we are not selecting all 250,000 rows and then ranking them by distance and taking the closest 5.

But that arbitrary 50-mile radius cutoff might not always succeed in encompassing 5 teachers, if, for example, the user picks an instrument from a different culture, such as sitar or oud or balalaika; there might not be five teachers of such instruments within 50 miles of her location.

Also, now imagine we have a query where a conservatory of music has sent us a list of 250 singers, who are students who have been accepted to the school for the upcoming year, and they want us to send them the five closest voice coaches for each person on the list, so that those students can arrange to get some coaching before they arrive on campus. We have to scan the teachers database 250 times (i.e. scan the geolocation index) because those students all live at different places around the country.

So, I was wondering, is it possible, for that latter query involving a list of 250 student locations, to write a recursive query where the radius begins small, at 10 miles, say, and then increases by 10 miles with each iteration, until either a maximum radius of 100 miles has been reached or the required five (5) teachers have been found? And can it be done only for those students who have yet to be matched with the required 5 teachers?

I'm thinking it cannot be done with SQL alone, and must be done with looping and a temporary table--but maybe that's because I haven't figured out how to do it with SQL alone.

P.S. The primaryInstrument column could reduce the size of the set ranked by distance too but for the sake of this question forget about that.

EDIT: Here's an example query. The SINGER (submitted) dataset contains a column with the arbitrary radius to limit the geo-results to a smaller subset, but as stated above, that radius may define a circle (whose centerpoint is the student's geolocation) which might not encompass the required number of teachers. Sometimes the supplied datasets contain thousands of addresses, not merely a few hundred.

select TEACHERSRANKEDBYDISTANCE.* from
(
select STUDENTSANDTEACHERSINRADIUS.*,
rowpos = row_number() 
over(partition by 
        STUDENTSANDTEACHERSINRADIUS.zipcode+STUDENTSANDTEACHERSINRADIUS.streetaddress 
      order by DistanceInMiles)
from
(
select
SINGER.name, 
SINGER.streetaddress, 
SINGER.city, 
SINGER.state, 
SINGER.zipcode, 
TEACHERS.name as TEACHERname, 
TEACHERS.streetaddress as TEACHERaddress, 
TEACHERS.city as TEACHERcity, 
TEACHERS.state as TEACHERstate, 
TEACHERS.zipcode as TEACHERzip,
TEACHERS.teacherid,
geography::Point(SINGER.lat, SINGER.lon, 4326).STDistance(TEACHERS.geolocation)
            / (1.6 * 1000) as DistanceInMiles
from 
SINGER left join TEACHERS
on 
( TEACHERS.geolocation).STDistance( geography::Point(SINGER.lat, SINGER.lon, 4326))   
     < (SINGER.radius * (1.6 * 1000 ))
 and TEACHERS.primaryInstrument='voice'
) as STUDENTSANDTEACHERSINRADIUS
) as TEACHERSRANKEDBYDISTANCE
where rowpos < 6    -- closest 5 is an abitrary requirement given to us
Was it helpful?

Solution

I think may be if you need just to get closest 5 teachers regardless of radius, you could write something like this. The Student will duplicate 5 time in this query, I don't know what do you want to get.

select
    S.name, 
    S.streetaddress, 
    S.city, 
    S.state, 
    S.zipcode, 
    T.name as TEACHERname, 
    T.streetaddress as TEACHERaddress, 
    T.city as TEACHERcity, 
    T.state as TEACHERstate, 
    T.zipcode as TEACHERzip,
    T.teacherid,
    T.geolocation.STDistance(geography::Point(S.lat, S.lon, 4326))
    / (1.6 * 1000) as DistanceInMiles
from SINGER as S
    outer apply (
        select top 5 TT.*
        from TEACHERS as TT
        where TT.primaryInstrument='voice'
        order by TT.geolocation.STDistance(geography::Point(S.lat, S.lon, 4326)) asc
    ) as T
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top