와일드 카드 (%) 만 값으로 성능을 좋아하는 SQL

https://stackoverflow.com/questions/1604707

05-07-2019
|

문제

쿼리의 성능이 Where Clause가없는 것과 비교할 때 Like Keyword와 WildCard를 값으로 사용하는 것이 어떤지 궁금합니다.

"where a like '%'"와 같은 위치 절을 고려하십시오. 이것은 열 'A'열의 가능한 모든 값과 일치합니다. 이것은 Where 절을 전혀 갖지 않는 것과 어떻게 비교됩니까?

내가 묻는 이유는 사용자가 검색 할 값을 지정할 수있는 일부 필드가있는 응용 프로그램이 있기 때문입니다. 경우에 따라 사용자는 가능한 모든 결과를 원합니다. 현재 다음과 같은 단일 쿼리를 사용하고 있습니다.

SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?

'%'및 '%'의 값은 A 및 또는 b에 대한 모든 가능한 값을 일치시키기 위해 제공 될 수 있습니다. 이를 위해 내 응용 프로그램에서 단일 이름의 쿼리를 사용할 수 있기 때문에 편리합니다. 성능 고려 사항이 무엇인지 궁금합니다. 쿼리 옵티마이저가 '%'처럼 단순히 일치하도록 '%'처럼 줄어 듭니까? 나는 명명 된 쿼리 (준비 문)를 사용하기 때문에 답에 영향을 줄 수 있음을 알고 있습니다. 대답이 데이터베이스 별 일 가능성이 높습니다. 따라서 특히 Oracle, MS SQL Server 및 Derby에서 어떻게 작동할까요?

이에 대한 대체 접근 방식은 와일드 카드를 입력하는 사용자를 기반으로 3 개의 개별 쿼리를 사용하는 것입니다.

a는 와일드 카드 쿼리입니다.

SELECT * FROM TableName WHERE b LIKE ?

B는 와일드 카드 쿼리입니다.

SELECT * FROM TableName WHERE a LIKE ?

A와 B는 와일드 카드입니다.

SELECT * FROM TableName

와일드 카드 없음 :

SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?

분명히 단일 쿼리를 갖는 것이 가장 간단하고 유지하기가 가장 쉽습니다. 성능이 여전히 좋으면 하나의 쿼리 만 사용합니다.

해결책 3

I was hoping there would be a textbook answer to this but it sounds like it will largely vary with different database types. Most of the responses indicated that I should run a test so that is exactly what I did.

My application primarily targets the Derby, MS SQL and Oracle databases. Since derby can be run embedded and is easy to set up, I tested the performance on that first. The results were surprising. I tested the worst case scenario against a fairly large table. I ran the test 1000 times and averaged the results.

Query 1:

SELECT * FROM TableName

Query 2 (With values of a="%" and b="%"):

SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?

Query 1 average time: 178ms

Query 2 average time: 181ms

So performance on derby is almost the same between the two queries.

다른 팁

SQL Server will generally see

WHERE City LIKE 'A%'

and treat it as

WHERE City >= 'A' AND City < 'B'

...and happily use an index seek if appropriate. I say 'generally', because I've seen it fail to do this simplification in certain cases.

If someone's trying to do:

WHERE City LIKE '%ville'

...then an index seek will be essentially impossible.

But something as simple as:

WHERE City LIKE '%'

will be considered equivalent to:

WHERE City IS NOT NULL

You can use whatever query analysis the DBMS offers (e.g. EXPLAIN for MySQL, SET SHOWPLAN_ALL ON for MS SQL (or use one of the other methods), EXPLAIN PLAN FOR for Oracle) to see how the query will be executed.

Any DBMS worth its salt would strip out LIKE '%' clauses before even trying to run the query. I'm fairly certain I've seen DB2/z do this in its execution plans.

The prepared statement shouldn't make a difference since it should be turned into real SQL before it gets to the execution engine.

But, as with all optimization questions, measure, don't guess! DBAs exist because they constantly tune the DBMS based on actual data (which changes over time). At a bare minimum, you should time (and get the execution plans) for all variations with suitable static data to see if there's a difference.

I know that queries like:

select c from t where ((1 = 1) or (c = ?))

are optimized to remove the entire where clause before execution (on DB2 anyway and, before you ask, the construct is useful where you need to remove the effect of the where clause but still maintain the parameter placeholder (using BIRT with Javascript to modify the queries for wildcards)).

Derby also offers tools for examining the actual query plan that was used, so you can run experiments using Derby and look at the query plan that Derby chose. You can run Derby with -Dderby.language.logQueryPlan=true, and Derby will write the query plan to derby.log, or you can use the RUNTIMESTATISTICS facility, as described here: http://db.apache.org/derby/docs/10.5/tuning/ctundepth853133.html

I'm not sure if Derby will strip out the A LIKE '%' ahead of time, but I also don't think that the presence of that clause will introduce much of a slowdown in the execution speed.

I'd be quite interested to see the actual query plan output that you get in your environment, with and without the A LIKE '%' clause in place.

Oracle 10gR2 does not appear to perform a special optimisation for this situation, but it does recognise that LIKE '%' excludes nulls.

create table like_test (col1)
as select cast(dbms_random.string('U',10) as varchar2(10))
from dual
connect by level <= 1000
/
insert into like_test values (null)
/
commit
/

exec dbms_stats.gather_table_stats(user,'like_test')

explain plan for
select count(*)
from   like_test
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from   like_test
where  col1 like '%'
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from   like_test
where  col1 is not null
/
select plan_table_output from table(dbms_xplan.display)
/

... giving ...

Plan hash value: 3733279756

------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |            |          |
|   2 |   TABLE ACCESS FULL| LIKE_TEST |  1001 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------

... and ...

Plan hash value: 3733279756

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |    10 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL| LIKE_TEST |  1000 | 10000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1" LIKE '%')

... and ...

Plan hash value: 3733279756

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |    10 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL| LIKE_TEST |  1000 | 10000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1" IS NOT NULL)

Note the cardinality (rows) on the TABLE ACCESS FULL line

Depending on how the LIKE predicate is structured and on the field you're testing on, you might need a full table scan. Semantically a '%' might imply a full table scan but Sql Server does all sorts of optimization internally on queries. So the question becomes: Does Sql Server optimize on a LIKE predicate formed with'%' and throws it out of the WHERE clause?

One aspect that I think is missing from the discussion is the fact that the OP wants to use a prepared statement. At the time the statement is prepared, the database/optimizer will not be able to work out the simplifications others have mentioned and so won't be able to optimize away the a like '%' as the actual value will not be known at prepare time.

Therefore:

when using prepared statements, have four different statements available (0, only a, only b, both) and use the appropriate one when needed
see if you get better performance when you don't use a prepared statement when sticking to just one statement (although then it would be pretty easy to not include 'empty' conditions)

What if a column has a non-null blank value? Your query will probably match it.

If this is a query for a real world application then try using the free text indexing features of most modern sql databases. The performance issues will become insignificant.

A simple if statement of if (A B) search a b else (A) search a else B search b else tell user they didn't specify anything

is trivial to maintain and becomes much easier to understand instead of making assumptions about the LIKE operator. You are probably going to do that in the UI anyway when you display the results "Your search for A found x" or "Your search for A B found..."

I'm not sure of the value of using a prepared statement with the kind of parameters you're describing. The reason is that you might fool the query optimizer into preparing an execution plan that would be completely wrong depending on which of the parameters were '%'.

For instance, if the statement were prepared with an execution plan using the index on column A, but the parameter for column A turned out to be '%' you may experience poor performance.

a where clause with " like '%'" as the only predicate will behave exactly the same as no where clause at all.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow