PostgreSQL query takes a long time (as it's not using the index)

https://dba.stackexchange.com/questions/284605

15-03-2021
|

Question

This query takes a long time, as I guess it's doing a full-table scan. It works ok in Oracle, just not in Postgres 12.

select count(*)
FROM EVENT 
WHERE PK_EVENT > (select CHECKPOINT1_PKEVENT FROM SYSTEM_CONTROL)

If however, I run this one, it comes back in seconds.

select count(*)
FROM EVENT 
WHERE PK_EVENT > (1000755073)

Both the PK_EVENT & CHECKPOINT1_PKEVENT columns are defined as "int8", as they could get very large. EVENT.PK_EVENT is the primary key. I've tried re-arranging the query, but so far nothing works. The count(*) only results in 2537 rows i.e. the number of rows generated today on our test Pg database.

Query plans :

using system_control:-

Finalize Aggregate  (cost=2909961.80..2909961.81 rows=1 width=8)
  InitPlan 1 (returns $0)
    ->  Seq Scan on system_control  (cost=0.00..1.01 rows=1 width=8)
  ->  Gather  (cost=2909960.58..2909960.79 rows=2 width=8)
        Workers Planned: 2
        Params Evaluated: $0
        ->  Partial Aggregate  (cost=2908960.58..2908960.59 rows=1 width=8)
              ->  Parallel Seq Scan on event  (cost=0.00..2866882.04 rows=16831415 width=0)
                    Filter: (pk_event > $0)
JIT:
  Functions: 8
  Options: Inlining true, Optimization true, Expressions true, Deforming true

using hardcoded value:-

Aggregate  (cost=11.42..11.43 rows=1 width=8)
  ->  Index Only Scan using event_pkey on event  (cost=0.57..11.36 rows=25 width=0)
        Index Cond: (pk_event > 1000755073)

No correct solution

OTHER TIPS

The entire query is planned up front, and at the time of planning it doesn't know what value will be found in CHECKPOINT1_PKEVENT. It makes the generic assumption that the inequality will match 1/3 of the rows, which is obviously quite wrong. When faced with this situation, I usually just have the client software run the queries separately, stuffing the result of one into a parameter for the other, assuming I can tolerate the prospect that the value may have changed between the two executions.

That said, it should probably be doing an index only scan anyway. Do you have a very high setting for random_page_cost? What are you other planner settings? Has the table EVENT been vacuumed recently?

You should include the EXPLAIN so we can see what's going on under the hood. What if your re-wrote your query like this logical equivalent, does it make any difference?

SELECT COUNT(*)
FROM EVENT E
INNER JOIN SYSTEM_CONTROL S
    ON E.PK_EVENT > S.CHECKPOINT1_PKEVENT

The only solution I've found thus far that is performant, is to put the inner query "SELECT COUNT(*) FROM EVENT E WHERE PK_EVENT > (arg)" into a stored function and then pass in the id from the SYSTEM_CONTROL. Thank you for the other suggestions :)

if you are using the * there is no use of index (apart from the PK). The subquery isn't using a WHERE clause so is doing a full table/index scan on SYSTEM_CONTROL

If you only want to get the number of rows you can change to this code

select count(PK_EVENT)
FROM EVENT 
WHERE PK_EVENT > (select CHECKPOINT1_PKEVENT FROM SYSTEM_CONTROL)

To improve the above query you can create a nonclustered index on EVENT table using the PK_EVENT (in case this isn't the PK of that table already) Also please provide the output of the EXPLAIN command:

EXPLAIN select count(PK_EVENT)
    FROM EVENT 
    WHERE PK_EVENT > (select CHECKPOINT1_PKEVENT FROM SYSTEM_CONTROL)

Maybe you can try with this CTE:

WITH SYSTEM_CONTROL_CTE (CHECKPOINT1_PKEVENT )
AS
(SELECT CHECKPOINT1_PKEVENT  
 FROM   SYSTEM_CONTROL)
SELECT Count(E.PK_EVENT)
FROM   SYSTEM_CONTROL_CTE as SC 
    ,EVENT as E 
WHERE E.PK_EVENT > SC.CHECKPOINT1_PKEVENT

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange