How to optimize the sort in Postgres query

https://dba.stackexchange.com/questions/108394

28-09-2020
|

Question

I've been trying to optimize the sort in the following query. I ran EXPLAIN ANALYZE and the majority of the time is during the sort when it arranges the output by distance.

I've tried converting the fields for lat, lng into text and removed the to_number function to see if it would make a difference but the results didn't change.

I tried creating an index in ttb_members_store on lat, lng to see if that would help and the queries were about 125 ms slower.

What would be the next step?

SELECT tms.*, 
       0 as kyori ,
       (power(
           (139.745069 - to_number(tms.lng,'000D00000000')) / 0.0111 * 1000000, 
           2) + 
        power(
           (35.662978 - to_number(tms.lat,'000D00000000')) / 0.0091   *1000000, 
           2)
       ) AS kyori2 
FROM ttb_members tms, 
     mtb_tenshu mt 
WHERE (tms.tenshu_cd = mt.small_cd) 
  AND (tms.view <> 0) 
  AND (tms.type in (2,3)) 
  AND (tms.delete_datetime is null)  
ORDER BY kyori2 ASC 
OFFSET 20 LIMIT 20;

EXPLAIN ANALYZE output:

---------------------------------------------------------------------------
 Limit  (cost=7180.01..7181.66 rows=20 width=621) (actual time=615.232..615.333 rows=20 loops=1)
   ->  Unique  (cost=7178.36..8968.36 rows=21697 width=621) (actual time=615.129..615.288 rows=40 loops=1)
         ->  Sort  (cost=7178.36..7232.60 rows=21697 width=621) (actual time=615.125..615.162 rows=43 loops=1)
               Sort Key: ((power((((139.745069 - to_number(tms.lng, '000D00000000'::text)) / 0.0111) * 1000000::numeric), 2::num
eric) + power((((35.662978 - to_number(tms.lat, '000D00000000'::text)) / 0.0091) * 1000000::numeric), 2::numeric))), tms.id, tms
.store_cd, tms.store_nm_org, tms.store_nm_chg, tms.store_nm_kn_org, tms.store_nm_kn_chg, tms.address1_org, tms.address2_org, tms
.address3_org, tms.address1_chg, tms.address2_chg, tms.address3_chg, tms.search_address, tms.tel, tms.kenku_cd, tms.tensyu_cd, t
ms.new_flg, tms.lat, tms.lng, tms.point, tms.level, tms.view, tms.search_word, tms.create_datetime, tms.update_datetime, tms.del
ete_datetime, tms.proc_flg, tms.type, tms.latlng_chg_flg, tms.view_chg_flg

               Sort Method: quicksort  Memory: 22864kB
               ->  Hash Join  (cost=334.39..5615.61 rows=21697 width=621) (actual time=3.891..284.492 rows=25391 loops=1)
                     Hash Cond: (tms.tensyu_cd = mt.small_cd)
                     ->  Bitmap Heap Scan on ttb_members_store tms  (cost=331.97..4657.03 rows=22804 width=621) (actual time=3.6
93..31.098 rows=23141 loops=1)
                           Recheck Cond: ((view <> 0) AND (type = ANY ('{2,3}'::integer[])) AND (delete_datetime IS NULL))
                           ->  Bitmap Index Scan on ttb_members_store_idx3  (cost=0.00..326.27 rows=22804 width=0) (actual time=
3.206..3.206 rows=23141 loops=1)
                     ->  Hash  (cost=1.63..1.63 rows=63 width=3) (actual time=0.160..0.160 rows=63 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 2kB
                           ->  Seq Scan on mtb_tensyu mt  (cost=0.00..1.63 rows=63 width=3) (actual time=0.005..0.078 rows=63 lo
ops=1)
 Total runtime: 616.526 ms

Here is the definition of the two tables:

Table ttb_members_store

     Column      |            Type             |     Modifiers
-----------------+-----------------------------+-----------------------------
 id              | integer                     | not null default nextval(...
 store_cd        | character(9)                | not null
 store_nm_org    | text                        |
 store_nm_chg    | text                        |
 store_nm_kn_org | text                        |
 store_nm_kn_chg | text                        |
 address1_org    | text                        |
 address2_org    | text                        |
 address3_org    | text                        |
 address1_chg    | text                        |
 address2_chg    | text                        |
 address3_chg    | text                        |
 search_address  | text                        |
 tel             | text                        |
 kenku_cd        | character(4)                |
 tensyu_cd       | character(2)                |
 new_flg         | smallint                    |
 lat             | text                        |
 lng             | text                        |
 point           | text                        |
 level           | smallint                    |
 view            | smallint                    |
 search_word     | text                        |
 create_datetime | timestamp without time zone | not null
 update_datetime | timestamp without time zone |
 delete_datetime | timestamp without time zone |
 proc_flg        | smallint                    |
 type            | smallint                    |
 latlng_chg_flg  | smallint                    |
 view_chg_flg    | smallint                    |
Indexes:
    "ttb_members_store_pkey" PRIMARY KEY, btree (id)
    "ttb_members_store_idx1" UNIQUE, btree (store_cd)

Table mtb_tensyu

     Column      |            Type             |     Modifiers
-----------------+-----------------------------+-----------------------------
 id              | integer                     | not null default nextval(...
 large_cd        | integer                     | not null
 large_nm        | text                        | not null
 small_cd        | character(2)                | not null
 small_nm        | text                        | not null
 create_datetime | timestamp without time zone | not null default now()
 update_datetime | timestamp without time zone |
 delete_datetime | timestamp without time zone |
Indexes:
    "mtb_tensyu_pkey" PRIMARY KEY, btree (id)
    "mtb_tensyu_idx1" btree (small_cd)

Solution

lat and lng are obviously numbers, so you should store them as appropriate numeric data type, not as text. That makes for smaller storage and disallows invalid input, it simplifies your query syntax and is also a bit faster overall. It's not going to do much for your query, though.
For instance:

lat             | numeric
lng             | numeric

So mtb_tenshu has store types. But if we can assume referential integrity and no row is excluded by the join, the table still is just noise in the presented query since you don't use it at all.

So we have this simplified query:

SELECT tm.* -- return only columns that are needed!
       0 as kyori
     , ((139.745069 - lng) / 0.0111 * 1000000)^2
     + (( 35.662978 - lat) / 0.0091 * 1000000)^2 AS kyori2 
FROM   ttb_members tm
WHERE  view <> 0
AND    type IN (2,3)
AND    delete_datetime IS NULL
ORDER  BY kyori2
OFFSET 20
LIMIT  20;

And since you commented:

All of the "AND" statements are constant.

You can support this very efficiently with a partial, functional index:

CREATE INDEX ttb_members_foo_idx ON ttb_members
 ((((139.745069 - lng) / 0.0111 * 1000000)^2
 + (( 35.662978 - lat) / 0.0091 * 1000000)^2));
WHERE  view <> 0
AND    type IN (2,3)
AND    delete_datetime IS NULL;

The expression and the WHERE conditions must be matches in the query to use this index.

Now Postgres can just skip the first 20 index entries and fetch the next 20 readily sorted in an index scan. Should be very fast.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange