Why does Postgres do a sequential scan where the index would return < 1% of the data?

https://stackoverflow.com/questions/19437816

01-07-2022
|

Question

I have 19 years of Oracle and MySQL experience (DBA and dev) and I am new to Postgres, so I may be missing something obvious. But I can not get this query to do what I want.

NOTE: This query is running on an EngineYard Postgres instance. I am not immediately aware of the parameters it has set up. Also, columns applicable_type and status in the items table are of extension type citext.

The following query can take in excess of 60 seconds to return rows:

SELECT items.item_id, 
       CASE when items.sku is null then items.title else concat(item.title, ' (SKU: ', items.sku, ')') END title, 
       items.listing_status, items.updated_at, items.id, 
       items.sku, count(details.id) detail_count 
FROM "items" LEFT OUTER JOIN details ON details.applicable_id = items.id 
                                    and details.applicable_type = 'Item' 
                                    and details.status = 'Valid' 
                LEFT OUTER JOIN products ON products.id = items.product_id
WHERE "items"."user_id" = 3
GROUP BY items.id
ORDER BY title asc
LIMIT 25 OFFSET 0

The details table contains 6.5M rows. The LEFT OUTER JOIN to it does a sequential scan on applicable_id. Cardinality-wise, that column has 120K distinct possibilities across 6.5M rows.

I have a btree index on details with the following columns:

applicable_id
applicable_type
status

but really, applicable_id and applicable_type have low cardinality.

My explain analyze looks like this:

Limit  (cost=247701.59..247701.65 rows=25 width=118) (actual time=28781.090..28781.098 rows=25 loops=1)
  ->  Sort  (cost=247701.59..247703.05 rows=585 width=118) (actual time=28781.087..28781.090 rows=25 loops=1)
      Sort Key: (CASE WHEN (items.sku IS NULL) THEN (items.title)::text ELSE pg_catalog.concat(items.title, ' (SKU: ', items.sku, ')') END)
      Sort Method: top-N heapsort  Memory: 30kB
      ->  HashAggregate  (cost=247677.77..247685.08 rows=585 width=118) (actual time=28779.658..28779.974 rows=664 loops=1)
          ->  Hash Right Join  (cost=2069.47..247645.64 rows=6425 width=118) (actual time=17798.898..28742.395 rows=60047 loops=1)
                Hash Cond: (details.applicable_id = items.id)
                ->  Seq Scan on details  (cost=0.00..220591.65 rows=6645404 width=8) (actual time=6.272..27702.717 rows=6646205 loops=1)
                      Filter: ((applicable_type = 'Listing'::citext) AND (status = 'Valid'::citext))
                      Rows Removed by Filter: 942
                ->  Hash  (cost=2062.16..2062.16 rows=585 width=118) (actual time=1.286..1.286 rows=664 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 90kB
                      ->  Bitmap Heap Scan on items  (cost=16.87..2062.16 rows=585 width=118) (actual time=0.157..0.748 rows=664 loops=1)
                            Recheck Cond: (user_id = 3)
                            ->  Bitmap Index Scan on index_items_on_user_id  (cost=0.00..16.73 rows=585 width=0) (actual time=0.141..0.141 rows=664 loops=1)
                                  Index Cond: (user_id = 3)

Total runtime: 28781.238 ms

Solution

Do you have an index on the expression that yields the title? Better yet, one on (user_id, title_expression).

If not, that might be an excellent thing to add, so as to nestloop through the first 25 rows of an index scan, seeing that Postgres can't reasonably guess which random 25 rows (hence the seq scan you're currently getting on the joined table) will be needed.

OTHER TIPS

I think you need an index on applicable_id column only (without applicable_type, status columns). You may also need to increase default_statistics_target param (system wide or better for applicable_id column only) so postgresql had better guess about number of rows in joining.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow