하위 쿼리 대 합류

https://stackoverflow.com/questions/141278

02-07-2019
|

문제

나는 다른 회사에서 상속 한 응용 프로그램의 느린 섹션을 하위 쿼리 대신 내부 조인을 사용하도록했습니다.

where id in (select id from ... )

리팩토링 된 쿼리는 약 100 배 빠르게 실행됩니다. (~ 50 초에서 ~ 0.3) 개선을 기대했지만 왜 그렇게 과감한 지 설명 할 수 있습니까? Where 절에 사용 된 열은 모두 색인화되었습니다. SQL은 행당 한 번에 where 조항에서 쿼리를 실행합니까?

업데이트 - 결과 설명 :

차이점은 "where id in ()"쿼리의 두 번째 부분에 있습니다.

2   DEPENDENT SUBQUERY  submission_tags ref st_tag_id   st_tag_id   4   const   2966    Using where

vs 1 인덱스 행

    SIMPLE  s   eq_ref  PRIMARY PRIMARY 4   newsladder_production.st.submission_id  1   Using index

해결책

"상관 된 서브 쿼리"(즉, 쿼리 행의 행에서 얻은 값에 의존하는 위치)는 각 행에 대해 한 번 실행됩니다. 비 상관 서브 쿼리 (where 조건이 포함 된 쿼리와 무관 한)가 처음에 한 번 실행됩니다. SQL 엔진은 이러한 차이를 자동으로 만듭니다.

그러나 예, 설명 계획은 더러운 세부 사항을 줄 것입니다.

다른 팁

You are running the subquery once for every row whereas the join happens on indexes.

Here's an example of how subqueries are evaluated in MySQL 6.0.

The new optimizer will convert this kind of subqueries into joins.

Run the explain-plan on each version, it will tell you why.

before the queries are run against the dataset they are put through a query optimizer, the optimizer attempts to organize the query in such a fashion that it can remove as many tuples (rows) from the result set as quickly as it can. Often when you use subqueries (especially bad ones) the tuples can't be pruned out of the result set until the outer query starts to run.

With out seeing the the query its hard to say what was so bad about the original, but my guess would be it was something that the optimizer just couldn't make much better. Running 'explain' will show you the optimizers method for retrieving the data.

Usually its the result of the optimizer not being able to figure out that the subquery can be executed as a join in which case it executes the subquery for each record in the table rather then join the table in the subquery against the table you are querying. Some of the more "enterprisey" database are better at this, but they still miss it sometimes.

This question is somewhat general, so here's a general answer:

Basically, queries take longer when MySQL has tons of rows to sort through.

Do this:

Run an EXPLAIN on each of the queries (the JOIN'ed one, then the Subqueried one), and post the results here.

I think seeing the difference in MySQL's interpretation of those queries would be a learning experience for everyone.

The where subquery has to run 1 query for each returned row. The inner join just has to run 1 query.

Look at the query plan for each query.

Where in and Join can typically be implemented using the same execution plan, so typically there is zero speed-up from changing between them.

Optimizer didn't do a very good job. Usually they can be transformed without any difference and the optimizer can do this.

The subquery was probably executing a "full table scan". In other words, not using the index and returning way too many rows that the Where from the main query were needing to filter out.

Just a guess without details of course but that's the common situation.

With a subquery, you have to re-execute the 2nd SELECT for each result, and each execution typically returns 1 row.

With a join, the 2nd SELECT returns a lot more rows, but you only have to execute it once. The advantage is that now you can join on the results, and joining relations is what a database is supposed to be good at. For example, maybe the optimizer can spot how to take better advantage of an index now.

It isn't so much the subquery as the IN clause, although joins are at the foundation of at least Oracle's SQL engine and run extremely quickly.

Taken from the Reference Manual (14.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOINS.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow