What would cause a query to run slowly when used a subquery, but not when run separately?

https://stackoverflow.com/questions/723250

05-09-2019
|

Question

I have something similar to the following:

SELECT c.id
FROM contact AS c
WHERE c.id IN (SELECT s.contact_id 
    FROM sub_table AS s
        LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
    WHERE c2.phone LIKE '535%')
ORDER BY c.name

The problem is that the query takes a very very very long time (>2minutes), but if I take the subquery, run it separately, implode the ids and insert them into the main query, it runs in well less than 1 second, including the data retrival and implosion.

I have checked the explains on both methods and keys are being used appropriately and the same ways. The subquery doesn't return more than 200 IDs.

What could be causing the subquery method to take so much longer?

BTW, I know the query above can be written with joins, but the query I have can't be--this is just a simplified version.

Using MySQL 5.0.22.

Solution

Sounds suspiciously like MySQL bug #32665: Query with dependent subquery is too slow.

OTHER TIPS

What happens if you try it like this?

SELECT c.id
FROM contact AS c
INNER JOIN (SELECT s.contact_id 
    FROM sub_table AS s
        LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
    WHERE c2.phone LIKE '535%') subq ON subq.contact_id=c.id
ORDER BY c.name

Assuming that the result of s.contact_id is unique. You can add distinct to the subquery if it is not.

I always use uncorrelated subqueries this way rather than using the IN operator in the where clause.

Have you checked the Execution Plan for the query? This will usually show you the problem.

Can't you do another join instead of a subquery?

SELECT c.id
FROM contact AS c
JOIN sub_table AS s on c.id = s.contact_id
LEFT JOIN contact_sub AS cs ON (s.id = cs.sub_field)
WHERE cs.phone LIKE '535%'
ORDER BY c.name

Since the subquery is referring to a field sub_field in the outer select, it has to be run once for each row in the outer table - the results for the inner query will change with each row in the outer table.

It's a correlated subquery. It runs once for each row in the outer select. (I think. You have two tables with the same correlation name, I'm assuming that's a typo. That you say it can't be rewritten as a join means it's correlated. )

Ok, I'm going to give you something to try. You say that the subquery is not correlated, and that you still can't join on it. And that it you take the output of the subquery, and lexically substitute that for the subquery, the main query runs much faster.

So try this: make the subquery into a view: create view foo followed by the text of the subquery. Then rewrite the main query to get rid of the "IN" clause and instead join to the view.

How's the timing on that?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow