Domanda

My tables (oversimplified for clarity):

documents:  registrations:
|id|        |id|document_id|date|

My simplified query:

SELECT *
FROM (SELECT documents.*,
             (SELECT max(date)
              FROM registrations
              WHERE registrations.document_id = documents.id) AS register_date
      FROM documents) AS my_documents_view -- it's supposed to be a view
ORDER BY register_date desc NULLS LAST
LIMIT 20;

When I try to order by register_date field I get a potato response of ~80 seconds execution.

EXPLAIN ANALYSE:

Limit  (cost=27237727.87..27237727.92 rows=20 width=192) (actual time=85124.599..85124.613 rows=20 loops=1)
  ->  Sort  (cost=27237727.87..27265594.16 rows=11146516 width=192) (actual time=85124.597..85124.600 rows=20 loops=1)
        Sort Key: ((SubPlan 2)) DESC NULLS LAST
        Sort Method: top-N heapsort  Memory: 33kB
        ->  Seq Scan on documents  (cost=0.00..26941123.09 rows=11146516 width=192) (actual time=0.074..77874.947 rows=11153930 loops=1)
              SubPlan 2
                ->  Result  (cost=2.19..2.29 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=11153930)
                      InitPlan 1 (returns $1)
                        ->  Limit  (cost=0.43..2.19 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=11153930)
                              ->  Index Only Scan Backward using registrations_document_id_date_idx on registrations  (cost=0.43..3.95 rows=2 width=4) (actual time=0.004..0.004 rows=1 loops=11153930)
                                    Index Cond: ((document_id = documents.id) AND (date IS NOT NULL))
                                    Heap Fetches: 10337268
Planning Time: 0.381 ms
Execution Time: 85124.722 ms

Complexity and cost are absurd, there are lots of rows in these two tables (millions, actually), but is it really so hard for engine to order it? Are there any workarounds or suggestions to optimize it?

In full query, I have some additional filters so it runs a bit faster, but it's still unacceptable for the project.

I tried playing with joins and indexes without any success.

È stato utile?

Soluzione

Try to flatten the subquery into a join:

SELECT *
FROM (SELECT d.*,
             max(r.date) AS register_date
      FROM documents AS d
      LEFT JOIN registrations AS r
          ON r.document_id = d.id
      GROUP BY d.id) AS my_documents_view
ORDER BY register_date desc NULLS LAST
LIMIT 20;
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top