MySQL subquery slows down drastically, but they work fine independently

https://dba.stackexchange.com/questions/14565

16-10-2019
|

Question

Query 1:

select distinct email from mybigtable where account_id=345

takes 0.1s

Query 2:

Select count(*) as total from mybigtable where account_id=123 and email IN (<include all from above result>)

takes 0.2s

Query 3:

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

takes 22 minutes and 90% its in the "preparing" state. Why does this take so much time.

Table is innodb with 3.2mil rows on MySQL 5.0

Solution

In Query 3, you are basically executing a subquery for every row of mybigtable against itself.

To avoid this, you need to make two major changes:

MAJOR CHANGE #1 : Refactor the Query

Here is your original query

Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)

You could try

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A;

or maybe the count per email

select email,count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A group by email;

MAJOR CHANGE #2 : Proper Indexing

I think you have this already since Query 1 and Query 2 run fast. Make sure you have a compound index on (account_id,email). Do SHOW CREATE TABLE mybigtable\G and make sure you have one. If you don't have it or if you are not sure, then create the index anyway:

ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);

UPDATE 2012-03-07 13:26 EST

If you want to do a NOT IN(), change the INNER JOIN to a LEFT JOIN and check for the right side being NULL, like this:

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;

UPDATE 2012-03-07 14:13 EST

Please read these two links on doing JOINs

Here is a great YouTube Video where I learned to refactor queries and the book it was based on

OTHER TIPS

In MySQL, subselects within the IN clause are re-executed for every row in the outer query, thus creating O(n^2). The short story is, don't use IN (SELECT).

Do you have an index on account_id?
The second problem may be with the nested sub-queries which have terrible performance in 5.0.
GROUP BY with a having clause is faster than DISTINCT.
What are you trying to do which may be better done through joins in addition to Item #3?

There is a lot of processing involved when handling an IN() subquery such as yours. You can read more about it here.

My first suggestion would be to attempt to re-write the subquery into a JOIN instead. Something like (not tested):

SELECT COUNT(*) AS total FROM mybigtable AS t1
 INNER JOIN 
   (SELECT DISTINCT email FROM mybigtable WHERE account_id=345) AS t2 
   ON t2.email=t1.email
WHERE account_id=123

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange