SQL query performance optimisation - fetching max(B) for corresponding A

https://stackoverflow.com/questions/23561984

18-07-2023
|

Question

I have a database scheme that looks like this (see http://sqlfiddle.com/#!2/4c9b4/1/0 ):

 create table t( id int,  dataA int, dataB int);
 insert into t select 1 ,1 ,1;
 insert into t select 2 ,1 ,2;
 insert into t select 3 ,1 ,3;
 insert into t select 4 ,2 ,1;
 insert into t select 5 ,2 ,2;
 insert into t select 6 ,2 ,4;
 insert into t select 7 ,3 ,1;
 insert into t select 8 ,3 ,2;
 insert into t select 9 ,4 ,1;

And an SQL query to fetch a list of "dataA" for the maximum "dataB" corresponding to "dataA"

SELECT * FROM t a WHERE dataB = (SELECT MAX(dataB) FROM t b WHERE b.dataA = a.dataA)

It works OK, however it can take up to 90 seconds to run on my dataset.

How can I improve performance of this query ?

Solution

Maybe MySQL executes the subquery again and again even for repeated dataA. The following statement just finds the max(dataB) once for each dataA. The rest is a simple join. Hope this is faster.

select t.*
from t
join (select dataA, max(dataB) as maxDataB from t group by dataA) max_t
  on t.dataA = max_t.dataA and t.dataB = max_t.maxDataB;

EDIT: Here is your SQL fiddle: http://sqlfiddle.com/#!2/4c9b4/2.

OTHER TIPS

MySQL does not do aggregation so well. The first thing to try is an index:

create index t_dataA_dataB on t(dataA, dataB);

That will probably fix the problem. The second is to use the following trick:

select a.*
from t a
where not exists (select 1
                  from t a2
                  where a2.dataA = a.dataA and
                        a2.dataB > a.dataB
                 );

This transforms the "get me the max" to the equivalent: "Get me all rows from t where there are no rows with the same dataA and a bigger dataB".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow