Postgres scarsa performance su ordine per "ID" Descer Limit 1

https://dba.stackexchange.com/questions/110636

29-09-2020
|

Domanda

Ho la tabella items con il seguente schema (in Postgres V9.3.5):

  Column   | Type   |                         Modifiers                  | Storage  
-----------+--------+----------------------------------------------------+----------
 id        | bigint | not null default nextval('items_id_seq'::regclass) | plain    
 data      | text   | not null                                           | extended 
 object_id | bigint | not null                                           | plain    
Indexes:
    "items_pkey" PRIMARY KEY, btree (id)
    "items_object_id_idx" btree (object_id)
Has OIDs: no

Quando eseguo la query si blocca per un tempo molto lungo:

SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;

Dopo il vuoto analizzare l'esecuzione della query è migliorato molto, ma ancora non perfetto.

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
                                                                            QUERY PLAN                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
   ->  Index Scan Backward using items_pkey on items  (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
         Filter: (object_id = 123::bigint)
         Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms
(5 rows)

La cosa strana è che quando eseguo

SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;

Restituisce 0 righe e posso farlo nel mio codice per ottimizzare le prestazioni della mia applicazione web, ma perché può essere fatto da Postgres stesso?Sono venuto a postgres da Mysql e non ho mai visto cose così strane lì.

=====

Ho trovato che usa un piano di query diverso, un indice diverso, ma perché?

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;
                                                                          QUERY PLAN                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
   ->  Index Scan using items_object_id_operation_idx on items  (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
         Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms
(4 rows)

Soluzione 2

per ottimizzare la query

SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;

Ho fatto il seguente

SELECT * FROM 
    (SELECT * FROM "items" 
     WHERE "object_id" = '123'
     ORDER BY "id" DESC) AS "items" 
ORDER BY "id" DESC LIMIT 1;

Aiutato senza aggiungere indice (object_id asc, id desc) che suggerito da @mustaccio.

# EXPLAIN SELECT * FROM 
    (SELECT * FROM "items" 
     WHERE "object_id" = '123'
     ORDER BY "id" DESC) AS "items" 
ORDER BY "id" DESC LIMIT 1;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=16629.84..16629.86 rows=1 width=59)
   ->  Sort  (cost=16629.84..16640.44 rows=4239 width=59)
         Sort Key: items.id
         ->  Bitmap Heap Scan on items  (cost=125.42..16374.45 rows=4239 width=59)
               Recheck Cond: (object_id = 123::bigint)
                   ->  Bitmap Index Scan on items_object_id_idx  (cost=0.00..124.36 rows=4239 width=0)
                     Index Cond: (object_id = 123::bigint)
(7 rows)

Altri suggerimenti

Cercando di spiegare perché c'è differenza nelle prestazioni tra le due query.

Questo: SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1 è soddisfatto da qualsiasi una riga con la corrispondenza object_id, quindi l'indice su object_id è una scelta naturale. La query richiede I / O minimale: Scansione indice per trovare il primo valore di corrispondenza più un heap Leggi per recuperare l'intera riga.

L'alternativa: SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1 Richiede Tutte le righe Tutte le righe con la corrispondenza object_id Essere ordinata da un'altra colonna, id, quindi la riga con il valore massimo di id viene restituito. Se è necessario utilizzare l'indice su object_id è necessario eseguire le seguenti operazioni: Scansiona l'indice per trovare ogni Corrispondenza object_id; Per ogni partita, vai a prendere la riga attuale; Quindi ordina tutte le righe prese da id e restituiscono quella con il più grande id.

L'alternativa scelta dall'ottimizzatore, presumibilmente basata sull'istogramma object_id, è: Scansiona l'indice su id all'indietro, nella sua interezza; Per ogni valore, vai a prendere la riga e controlla se il valore di object_id corrisponde; Restituire la prima riga di corrispondenza, che avrà il massimo valore id massimo possibile. Questa alternativa evita di ordinare le righe, quindi suppongo che l'ottimizzatore lo preferisca utilizzando l'indice su object_id.

La presenza di un indice su (object_id asc, id desc) consente ancora un'altra alternativa: Scansione di questo nuovo indice per la prima voce che corrisponde al valore object_id fornito, che per definizione avrà il valore più alto id; Vai a prendere una riga corrispondente e ritorna. Ovviamente, questo è l'approccio più efficiente.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange