Postgres prefers slow Seq Scan over fast Index Scan

https://dba.stackexchange.com/questions/283781

14-03-2021
|

Question

Tables

I have the following two tables.

books, which contains ~5 million rows:

                                  Table "public.books"
 Column  |           Type           |                     Modifiers
---------+--------------------------+----------------------------------------------------
 id      | integer                  | not null default nextval('books_id_seq'::regclass)
 run__id | integer                  |
 time    | timestamp with time zone | not null
 venue   | character varying        | not null
 base    | character varying        | not null
 quote   | character varying        | not null
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
    "run_books_index" UNIQUE, btree (run__id, id)
Foreign-key constraints:
    "books_run__id_fkey" FOREIGN KEY (run__id) REFERENCES runs(id)
Referenced by:
    TABLE "orders" CONSTRAINT "orders_book__id_fkey" FOREIGN KEY (book__id) REFERENCES books(id)

and orders, which contains ~3 billion rows:

          Table "public.orders"
  Column  |       Type       | Modifiers
----------+------------------+-----------
 book__id | integer          | not null
 is_buy   | boolean          | not null
 qty      | double precision | not null
 price    | double precision | not null
Indexes:
    "orders_pkey" PRIMARY KEY, btree (book__id, is_buy, price)
Foreign-key constraints:
    "orders_book__id_fkey" FOREIGN KEY (book__id) REFERENCES books(id)

Query

I want to run the following query:

SELECT * FROM books b JOIN orders o ON o.book__id = b.id WHERE b.run__id = 1

Postgres suggests the following execution plan:

orderbooks=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM books b JOIN orders o ON o.book__id = b.id WHERE b.run__id = 1;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2465.94..58122020.57 rows=762939 width=53) (actual time=1394110.723..2561879.775 rows=45216 loops=1)
   Hash Cond: (o.book__id = b.id)
   Buffers: shared hit=4080 read=18437586
   ->  Seq Scan on orders o  (cost=0.00..47292761.72 rows=2885110272 width=21) (actual time=0.018..2265529.184 rows=2883798728 loops=1)
         Buffers: shared hit=4073 read=18437586
   ->  Hash  (cost=2448.52..2448.52 rows=1393 width=32) (actual time=0.024..0.024 rows=15 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         Buffers: shared hit=4
         ->  Index Scan using run_books_index on books b  (cost=0.43..2448.52 rows=1393 width=32) (actual time=0.011..0.012 rows=15 loops=1)
               Index Cond: (run__id = 1)
               Buffers: shared hit=4
 Planning time: 2.228 ms
 Execution time: 2561882.272 ms
(13 rows)

Ie. sequentially scanning through the ~3 billion rows in orders, which takes ~40 minutes.

If I disable sequential scans, Postgres suggests the following -- much faster -- query:

orderbooks=> SET enable_seqscan = OFF;
SET
orderbooks=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM books b JOIN orders o ON o.book__id = b.id WHERE b.run__id = 1;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.14..219271454.14 rows=762939 width=53) (actual time=0.024..15.234 rows=45216 loops=1)
   Buffers: shared hit=707
   ->  Index Scan using run_books_index on books b  (cost=0.43..2448.52 rows=1393 width=32) (actual time=0.011..0.014 rows=15 loops=1)
         Index Cond: (run__id = 1)
         Buffers: shared hit=4
   ->  Index Scan using orders_pkey on orders o  (cost=0.70..156529.57 rows=87819 width=21) (actual time=0.007..0.551 rows=3014 loops=15)
         Index Cond: (book__id = b.id)
         Buffers: shared hit=703
 Planning time: 0.302 ms
 Execution time: 18.054 ms
(10 rows)

Question

Why does Postgres prefer the slow execution plan? I see that its cost is less than the fast one, how come?

Notes

I ran VACUUM (VERBOSE, ANALYZE) shortly before trying to execute/analyze the queries.
If I don't request the column orders.qty in the output, then Postgres chooses the fast query (without needing to disable sequential scans):

orderbooks=> EXPLAIN SELECT b.id, b.run__id, b.time, b.venue, b.base, b.quote, o.book__id, o.is_buy, o.price FROM books b JOIN orders o ON o.book__id = b.id WHERE b.run__id = 1;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.14..6473136.93 rows=762939 width=45)
   ->  Index Scan using run_books_index on books b  (cost=0.43..2448.52 rows=1393 width=32)
         Index Cond: (run__id = 1)
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.70..3766.96 rows=87819 width=13)
         Index Cond: (book__id = b.id)
(5 rows)

Domain

The data structure I'm trying to model is a limit order book. A row in the books table contains information about a single order book, while a row in the orders table describes a single order present in the given book.

I have chosen to not allow multiple orders with the same price for the same order book -- thus the (book__id, is_buy, price) primary key on orders. The reason is that I don't need to distinguish between multiple different orders (e.g. from different people) of the same price. So if my input data, for a given order book, contains two same-priced orders, I simply convert it into a single order whose quantity is the sum of the two order quantities.

Versions

PostgreSQL 9.6.19 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1, 64-bit

Solution

There seem to be several things off:

The estimates are wrong. Since you already ANALYZEd both tables, retry with a higher default_statistics_target.
Since index scans seem to be estimated too expensive, perhaps your setting of random_page_cost is too high. Also, perhaps effective_cache_size is too low.

If you didn't select everything from books, a covering index might be an idea...

OTHER TIPS

To elaborate on your problem of bad estimates:

Postgres estimates to get 1393 rows from the index scan on run_books_index, but actually only finds 15. That's off by a factor of ~ 100.

... (cost=0.43..2448.52 rows=1393 width=32) (actual time=0.011..0.012 rows=15 loops=1)

After you ran ANALYZE, so we cannot blame outdated statistics.

Consequentially, Postgres is mislead into expecting a lot more rows from orders than it will retrieve eventually, which heavily favors a sequential scan.

Your alternative query only fetches book__id, is_buy, and price from table orders. That can be read from the PK index orders_pkey directly with an Index Only Scan, which is as cheap as it gets, even when fetching a large part of the table.

A PK on (book__id, is_buy, price) makes me wonder about your relational model, though? Each book can only have one order (or two, considering is_buy) for the same price? Your query still finds ~ 3000 orders per book. I have a hard time making sense of that.

SELECT * is typically unnecessary and a drag on performance on its own. Your alternative query seems smarter to begin with.
But while fetching more columns than your multicolumn index covers, an index on just (book__id) would be more efficient as that's ~ 1/3 smaller in the minimum form and typically less bloated, too.

Raising the statistics target will gather more data on "most common values". But that also leads to better estimates about the rest.

Gathering and using more statistics also adds costs. For only few columns in a few big tables that are critical for query-planning, consider targeting those selectively and leave the global default_statistics_target at its default. Like:

ALTER TABLE books ALTER run__id SET STATISTICS 2000;

Bear in mind that more statistics data can only help with irregular data distribution.

You mentioned ~3 billion rows in orders, but not the cardinality of books, so 2000 is a shot in the dark, assuming that books is not much smaller (given the odd PK in orders). Well above the default 100, but still below the maximum of 10000.

Consider the rest of Laurenz' advise, too. Those settings are typically important.

If none of this helps, especially if your table is very big and with an unfortunate data distribution, you might try to force your luck with a manual setting for the ratio of distinct values:

ALTER TABLE books ALTER run__id SET (n_distinct = -0.07);

The negative value between -1 and 0 implies a fixed frequency ratio. Your query plan shows 15 rows for a single run__id. 1/15 =~ 0.07. The setting forces to always expect that frequency. Details for n_distinct in the manual. Related:

Postgres sometimes uses inferior index for WHERE a IN (...) ORDER BY b LIMIT N

But forced estimates may come around to bite you later.

Another alternative, especially if you cannot (or will not) make any of the suggested changes to the DB configuration: use a different query with a LATERAL subquery:

SELECT *
FROM   books b
CROSS  JOIN  LATERAL (
   SELECT *
   FROM   orders
   WHERE  book__id = b.id
   ) o
WHERE b.run__id = 1;

That's another workaround. Typically more expensive for the purpose, but it heavily favors index scans on orders, which seems to be much more important. See:

Postgres sometimes uses inferior index for WHERE a IN (...) ORDER BY b LIMIT N

Finally, Postgres has become gradually smarter about estimates and query plans, so upgrading to a current version might help, too. Version 9.6 is getting old.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange