Question

I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).

I want to know if there is any gain from the following, and more pressing, will it even give the desired results.

select a.* from a where a.id in (select id from b where b.id = a.id)

as an alternative to:

select a.* from a where a.id in (select id from b)

Update: MySQL Can't be more specific sorry table a is effectively a join between 7 different tables. use of * is for examples

Edit, b doesn't get selected

Was it helpful?

Solution

Your question was about the difference between these two:

select a.* from a where a.id in (select id from b where b.id = a.id)

select a.* from a where a.id in (select id from b)

The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.

The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.

I would use the latter.

OTHER TIPS

Both queries you list are the equivalent of:

select a.* 
from a 
inner join b on b.id = a.id

Almost all optimizers will execute them in the same way.

You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.

YMMV, but I've often found using EXISTS instead of IN makes queries run faster.

SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)

Of course, without seeing the rest of the query and the context, this may not make the query any faster.

JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.

I would never use a subquery like this. A join would be much faster.

select a.*
from a 
join b on a.id = b.id

Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.

Have you looked at the execution plan?

How about

select a.* 
from a 
inner join b
on a.id = b.id

presumably the id fields are primary keys?

Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID

I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)

Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).

your problem is most likely in the seven tables within "a"

make the FROM table contain the "a.id" make the next join: inner join b on a.id = b.id

then join in the other six tables.

you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top