Question

Let me know if this should be posted on DBA.stackexchange.com instead...

I have the following query:

SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
  ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
  ON jobs.service_of_process_id = service_of_processes.id
WHERE
  (jobs.account_id = 250093
  OR court_cases.account_id = 250093)
ORDER BY
  court_cases.court_date DESC NULLS LAST,
  court_cases.id DESC
LIMIT 30
OFFSET 0;

But it takes a good 2-4 seconds to run, and in a web application this is unacceptable for a single query.

I ran EXPLAIN (ANALYZE, BUFFERS) on the query as suggested on the PostgreSQL wiki, and have put the results here: http://explain.depesz.com/s/Yn6

The table definitions for those tables involved in the query is here (including the indexes on foreign key relationships):

http://sqlfiddle.com/#!15/114c6

Is it having issues using the indexes because the WHERE clause is querying from two different tables? What kind of index or change to the query can I make to make this run faster?

These are the current sizes of the tables in question:

PSQL=# select count(*) from service_of_processes;
 count  
--------
 103787
(1 row)

PSQL=# select count(*) from jobs;
 count  
--------
 108995
(1 row)

PSQL=# select count(*) from court_cases;
 count 
-------
 84410
(1 row)

EDIT: I'm on Postgresql 9.3.1, if that matters.

Was it helpful?

Solution

or clauses can make optimizing a query difficult. One idea is to split the two parts of the query into two separate subqueries. This actually simplifies one of them a lot (the one on court_cases.account_id).

Try this version:

(SELECT cc.*
 FROM "court_cases" cc
 WHERE cc.account_id = 250093
 ORDER BY cc.court_date DESC NULLS LAST,
          cc.id DESC
 LIMIT 30
) UNION ALL
(SELECT cc.*
 FROM "court_cases" cc LEFT OUTER JOIN
      service_of_processes sop
      ON sop.court_case_id = cc.id LEFT OUTER JOIN
      jobs j
      ON j.service_of_process_id = sop.id
 WHERE (j.account_id = 250093 AND cc.account_id <> 250093)
 ORDER BY cc.court_date DESC NULLS LAST, id DESC
 LIMIT 30
)
ORDER BY court_date DESC NULLS LAST,
         id DESC
LIMIT 30 OFFSET 0;

And add the following indexes:

create index court_cases_accountid_courtdate_id on court_cases(account_id, court_date, id);
create index jobs_accountid_sop on jobs(account_id, service_of_process_id);

Note that the second query uses and cc.count_id <> 250093, which prevents duplicate records. This eliminates the need for distinct or for union (without union all).

OTHER TIPS

I'll try modifying the query as the following:

SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
  ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
  ON jobs.service_of_process_id = service_of_processes.id and jobs.account_id = 250093
WHERE
  (court_cases.account_id = 250093)
ORDER BY
  court_cases.court_date DESC NULLS LAST,
  court_cases.id DESC
LIMIT 30
OFFSET 0;

I think that the issue is in the fact that the where filter is not properly decomposed by query planner optimizer, a really strange performance bug

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top