How can I speed-up my query on geo-location processes
-
05-10-2020 - |
Domanda
I have a table that contains 10,301,390 GPS records, cities, countries and IP address blocks. I have user's current location with latitude and longitude. I created this query:
SELECT
*, point(45.1013021, 46.3021011) <@> point(latitude, longitude) :: point AS distance
FROM
locs
WHERE
(
point(45.1013021, 46.3021011) <@> point(latitude, longitude)
) < 10 -- radius
ORDER BY
distance LIMIT 1;
This query successfully gave me what I want, but it is slow. It took 2 to 3 seconds to get one record by given latitude and longitude.
I tried a B-Tree index on the latitude
and longitude
columns, also tried GIST( point(latitude, longitude));
but still querying is slow.
How can I speed up this query?
Update:
It seems slowness is caused by the ORDER BY
but I want to get the shortest distance, so the question remains.
Soluzione
You may consider using a GIST index based on using the function ll_to_earth
. This index will allow for fast "nearby" searches.
CREATE INDEX
ON locs USING gist (ll_to_earth(lat, lng));
Once you have this index, your query should be done in a different way.
Your (lat, lng) pairs need to be converted to the earth
type, and compared with the indexed values (which are of the same type). Your query will need to have two conditions, one for "approximate" result, and one for the "precise" one. The first one will be able to use the previous index:
SELECT
*
FROM
locs
WHERE
/* First condition allows to search for points at an approximate distance:
a distance computed using a 'box', instead of a 'circumference'.
This first condition will use the index.
(45.1013021, 46.3021011) = (lat, lng) of search center.
25000 = search radius (in m)
*/
earth_box(ll_to_earth(45.1013021, 46.3021011), 25000) @> ll_to_earth(lat, lng)
/* This second condition (which is slower) will "refine"
the previous search, to include only the points within the
circumference.
*/
AND earth_distance(ll_to_earth(45.1013021, 46.3021011),
ll_to_earth(lat, lng)) < 25000 ;
For using this code, you need two extensions (included in most PostgreSQL distributions):
CREATE EXTENSION IF NOT EXISTS cube ;
CREATE EXTENSION IF NOT EXISTS earthdistance;
This is the documentation for them:
- Cube. You should take a look at the description of the @> operator. This module is needed by the next one.
- EarthDistance. You will find here information about
earth_box
andearth_distance
. This module assumes that the earth is spherical, which is an approximation good enough for the majority of applications.
A test with a table consisting of 2.2 million rows taken from the Free World Cities Database gives me the following answer to the previous query (which is not exactly the same as yours):
"ru","andra-ata","Andra-Ata","24",,44.9509,46.3327
"ru","andratinskiy","Andratinskiy","24",,44.9509,46.3327
"ru","chernozemelskaya","Chernozemelskaya","24",,44.9821,46.0622
"ru","gayduk","Gayduk","24",,44.9578,46.5244
"ru","imeni beriya","Imeni Beriya","24",,45.0208,46.3906
"ru","imeni kirova","Imeni Kirova","24",,45.2836,46.4847
"ru","kumskiy","Kumskiy","24",,44.9821,46.0622
"ru","kumskoy","Kumskoy","24",,44.9821,46.0622
"ru","lopas","Lopas","17",,44.937,46.1833
"ru","pyatogo dekabrya","Pyatogo Dekabrya","24",,45.1858,46.1656
"ru","svetlyy erek","Svetlyy Erek","24",,45.0079,46.4408
"ru","ulan tuk","Ulan Tuk","24",,45.1542,46.1097
To have an "order of magnitude" idea about timings: pgAdmin III is telling me that the time to get this answer is 22 ms. (PostgreSQL 9.6.1 with "out-of-the-box" parameters, on a Mac with Mac OS 10.12, Core i7, SSD)
Altri suggerimenti
Alternative answer with PostGIS
If you're using 10 million rows. You probably need to step up and upgrade to PostGIS.
- Convert your points to geography types. I assume they're in SRID 4326 anyway if they come from GPS. For this you can use
geometery(point)::geography
, or if you store in lat/long you can useST_MakePoint
- Create an index on the new geom column (of ST_Points)
- Then you want to use
ST_DWithin
. This function will use an index (if you create one). - Then calculate just the
ST_Distance
on the points in the bounding box
Here is the sig for ST_DWithin,
boolean ST_DWithin(geometry g1, geometry g2, double precision distance_of_srid);
boolean ST_DWithin(geography gg1, geography gg2, double precision distance_meters);
boolean ST_DWithin(geography gg1, geography gg2, double precision distance_meters, boolean use_spheroid);
It can measure your distance along the spheroid or sphere.
SELECT geom, ST_Distance(geom, point)
WHERE ST_DWithin( geom, pointgiven, limit to check in km )
ORDER BY geom <=> point ASC
LIMIT 1;