Domanda

I have a table that contains 10,301,390 GPS records, cities, countries and IP address blocks. I have user's current location with latitude and longitude. I created this query:

SELECT
  *, point(45.1013021, 46.3021011) <@> point(latitude, longitude) :: point AS distance
FROM
  locs
WHERE
  (
    point(45.1013021, 46.3021011) <@> point(latitude, longitude)
  ) < 10 -- radius
ORDER BY
  distance LIMIT 1;

This query successfully gave me what I want, but it is slow. It took 2 to 3 seconds to get one record by given latitude and longitude.

I tried a B-Tree index on the latitude and longitude columns, also tried GIST( point(latitude, longitude)); but still querying is slow.

How can I speed up this query?

Update:

It seems slowness is caused by the ORDER BY but I want to get the shortest distance, so the question remains.

È stato utile?

Soluzione

You may consider using a GIST index based on using the function ll_to_earth. This index will allow for fast "nearby" searches.

CREATE INDEX 
   ON locs USING gist (ll_to_earth(lat, lng));

Once you have this index, your query should be done in a different way.

Your (lat, lng) pairs need to be converted to the earth type, and compared with the indexed values (which are of the same type). Your query will need to have two conditions, one for "approximate" result, and one for the "precise" one. The first one will be able to use the previous index:

SELECT
    *
FROM
    locs
WHERE
    /* First condition allows to search for points at an approximate distance:
       a distance computed using a 'box', instead of a 'circumference'.
       This first condition will use the index.
       (45.1013021, 46.3021011) = (lat, lng) of search center. 
       25000 = search radius (in m)
    */
    earth_box(ll_to_earth(45.1013021, 46.3021011), 25000) @> ll_to_earth(lat, lng) 

    /* This second condition (which is slower) will "refine" 
       the previous search, to include only the points within the
       circumference.
    */
    AND earth_distance(ll_to_earth(45.1013021, 46.3021011), 
             ll_to_earth(lat, lng)) < 25000 ;

For using this code, you need two extensions (included in most PostgreSQL distributions):

CREATE EXTENSION IF NOT EXISTS cube ;
CREATE EXTENSION IF NOT EXISTS earthdistance;

This is the documentation for them:

  • Cube. You should take a look at the description of the @> operator. This module is needed by the next one.
  • EarthDistance. You will find here information about earth_box and earth_distance. This module assumes that the earth is spherical, which is an approximation good enough for the majority of applications.

A test with a table consisting of 2.2 million rows taken from the Free World Cities Database gives me the following answer to the previous query (which is not exactly the same as yours):

"ru","andra-ata","Andra-Ata","24",,44.9509,46.3327
"ru","andratinskiy","Andratinskiy","24",,44.9509,46.3327
"ru","chernozemelskaya","Chernozemelskaya","24",,44.9821,46.0622
"ru","gayduk","Gayduk","24",,44.9578,46.5244
"ru","imeni beriya","Imeni Beriya","24",,45.0208,46.3906
"ru","imeni kirova","Imeni Kirova","24",,45.2836,46.4847
"ru","kumskiy","Kumskiy","24",,44.9821,46.0622
"ru","kumskoy","Kumskoy","24",,44.9821,46.0622
"ru","lopas","Lopas","17",,44.937,46.1833
"ru","pyatogo dekabrya","Pyatogo Dekabrya","24",,45.1858,46.1656
"ru","svetlyy erek","Svetlyy Erek","24",,45.0079,46.4408
"ru","ulan tuk","Ulan Tuk","24",,45.1542,46.1097

To have an "order of magnitude" idea about timings: pgAdmin III is telling me that the time to get this answer is 22 ms. (PostgreSQL 9.6.1 with "out-of-the-box" parameters, on a Mac with Mac OS 10.12, Core i7, SSD)

Altri suggerimenti

Alternative answer with PostGIS

If you're using 10 million rows. You probably need to step up and upgrade to PostGIS.

  1. Convert your points to geography types. I assume they're in SRID 4326 anyway if they come from GPS. For this you can use geometery(point)::geography, or if you store in lat/long you can use ST_MakePoint
  2. Create an index on the new geom column (of ST_Points)
  3. Then you want to use ST_DWithin. This function will use an index (if you create one).
  4. Then calculate just the ST_Distance on the points in the bounding box

Here is the sig for ST_DWithin,

boolean ST_DWithin(geometry g1, geometry g2, double precision distance_of_srid);
boolean ST_DWithin(geography gg1, geography gg2, double precision distance_meters);
boolean ST_DWithin(geography gg1, geography gg2, double precision distance_meters, boolean use_spheroid);

It can measure your distance along the spheroid or sphere.

SELECT geom, ST_Distance(geom, point)
WHERE ST_DWithin( geom, pointgiven, limit to check in km )
ORDER BY geom <=> point ASC
LIMIT 1;
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top