Increase speed of SQL query execution

https://stackoverflow.com/questions/22710733

23-06-2023
|

Question

I have this SQL query that I run through JDO in java:

SELECT idPath, token, isTV, rel
FROM path
LEFT OUTER JOIN relation ON (path.idTokenN=relation.idTokenN)
WHERE path.idPath IN( SELECT DISTINCT path.idPath
                      FROM path
                      WHERE path.isTV='true'
                    )
  AND path.idPath IN( SELECT DISTINCT path.idPath
                      FROM path
                      GROUP BY path.idPath
                      HAVING COUNT(*) >= 2 AND COUNT(*) <= 3
                    )

To run it takes about 3 hours, how can I decrease the execution time?

EDIT

Using the MySQL DBMS

path table:

     idTokenN idPath  token      isTV
       1      p1      test1      true
       2      p1      test2      false
       3      p2      test3      true
       4      p3      test4      false
       5      p3      test5      false
       6      p4      test6      false
       7      p4      test7      true
       8      p4      test8      false
       9      p4      test9      false
       10     p5      test10     true
       11     p5      test11     false
       12     p5      test12     false

relation table:

      idTokenN  idTokenN2  rel
       10       11         test

Of course the data in my DB are many, this is just an example!

Solution

Optimizing a query depends on several factors. The most important is the database engine. The second are the characteristics of the data. Your question provides information on neither of these.

A very important piece of information is the size of the two tables, the number rows in each table, and the number of distinct values of idTokenN in each one. It is quite possible that the left outer join is determining the performance characteristics of the query.

The very first thing you can do is remove the distinct keyword. This is never needed in in subqueries, and some database engines may not ignore it.

Another step to optimizing the query is to remove the in subqueries. In some databases, these do not optimize well. They can be replaced by a join and aggregation subquery:

SELECT p.idPath, p.token, p.isTV, r.rel
FROM path p LEFT OUTER JOIN
     relation r
     ON p.idTokenN = r.idTokenN JOIN
     (select idPath, max(case when p.isTV = 'true' then 1 else 0 end) as HasTv,
             (case when COUNT(*) between 2 and 3 then 1 else 0 end) as Has2_3
      from path p
      group by idpath
     ) pf
     on p.idpath = pf.idpath and
        pf.HasTv = 1 and pf.Has2_3  = 1;

There are definitely other things you can do, but beyond this, they become database dependent.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow