Question

I'm currently using MySQL 5.6.10.

My actual query is more complicated, but here is a simple way to reproduce the problem. I know the query below is useless (select id from x where id in (select id from x...)), but it proves my point.

I created this table:

CREATE  TABLE test (
  id INT NOT NULL AUTO_INCREMENT ,
  PRIMARY KEY (id));

Then ran this command 5 times - it created 50 rows in the table:

INSERT INTO test (id) VALUES(null),(null),(null),(null),(null),(null),(null),(null),(null),(null);

And then ran this explain:

EXPLAIN SELECT id FROM test WHERE 
       id in (SELECT id FROM test WHERE id < 5);

And got this: 4 Rows

Which makes perfect sense to me. But then if I add an OR to the WHERE clause with another IN, like this:

EXPLAIN SELECT id FROM test WHERE 
       id IN (SELECT id FROM test WHERE id < 5)
    OR id IN (SELECT id FROM test WHERE id > 45);

suddenly MySQL is looking at all 50 rows: 50 Rows

I know that the query could be re-written as SELECT id FROM test WHERE id < 5 OR id > 45, or a UNION etc, again, that's not the point. The point is MySQL is examining far too many rows.

If I run a FLUSH STATUS / SHOW STATUS LIKE "Handler%" on the first query, this is what I get:

Handler_read_key 5
Handler_external_lock 4
Handler_read_next 4
Handler_read_first 1

But if I do that to the second query, I get:

Handler_read_key 99
Handler_write 9
Handler_external_lock 6
Handler_read_next 59
Handler_read_first 2

Why the big difference? I wonder if it is the optimizer, and if so, is there some option I can include in the query that will prevent this "optimization"? This has real practical implications for a query I'm developing. Instead of examining only a few hundred rows, MySQL is examining 120,000.

Était-ce utile?

La solution

Generally speaking, RDBMS are not able to optimise subqueries as well as they can optimise proper table joins. As documented under Rewriting Subqueries as Joins (emphasis added):

Sometimes there are other ways to test membership in a set of values than by using a subquery. Also, on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. One of these is the IN() construct:

For example, this query:

SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);

Can be rewritten as:

SELECT DISTINCT t1.* FROM t1, t2 WHERE t1.id=t2.id;

In your abstract case (i.e. ignoring other obvious improvements one would make to this query in reality):

SELECT DISTINCT t1.*
FROM   test t1
  JOIN test t2 USING (id)
  JOIN test t3 USING (id)
WHERE  t2.id < 5
    OR t3.id > 45;

For which the execution plan is:

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+
| ID | SELECT_TYPE | TABLE |  TYPE  | POSSIBLE_KEYS |   KEY   | KEY_LEN |       REF        | ROWS |                   EXTRA                   |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+
|  1 | SIMPLE      | t1    | range  | PRIMARY       | PRIMARY |       4 | (null)           |    9 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY |       4 | db_2_129b4.t1.id |    1 | Using index; Distinct                     |
|  1 | SIMPLE      | t3    | eq_ref | PRIMARY       | PRIMARY |       4 | db_2_129b4.t1.id |    1 | Using index; Distinct                     |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+

See it on sqlfiddle.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top