How do I optimize MySQL's queries with constants?
-
10-07-2019 - |
Question
NOTE: the original question is moot but scan to the bottom for something relevant.
I have a query I want to optimize that looks something like this:
select cols from tbl where col = "some run time value" limit 1;
I want to know what keys are being used but whatever I pass to explain, it is able to optimize the where clause to nothing ("Impossible WHERE noticed...") because I fed it a constant.
- Is there a way to tell mysql to not do constant optimizations in explain?
- Am I missing something?
- Is there a better way to get the info I need?
Edit: EXPLAIN
seems to be giving me the query plan that will result from constant values. As the query is part of a stored procedure (and IIRC query plans in spocs are generated before they are called) this does me no good because the value are not constant. What I want is to find out what query plan the optimizer will generate when it doesn't known what the actual value will be.
Am I missing soemthing?
Edit2: Asking around elsewhere, it seems that MySQL always regenerates query plans unless you go out of your way to make it re-use them. Even in stored procedures. From this it would seem that my question is moot.
However that doesn't make what I really wanted to know moot: How do you optimize a query that contains values that are constant within any specific query but where I, the programmer, don't known in advance what value will be used? -- For example say my client side code is generating a query with a number in it's where
clause. Some times the number will result in an impossible where clause other times it won't. How can I use explain to examine how well optimized the query is?
The best approach I'm seeing right off the bat would be to run EXPLAIN
on it for the full matrix of exist/non-exist cases. Really that isn't a very good solution as it would be both hard and error prone to do by hand.
Solution
For example say my client side code is generating a query with a number in it's where clause.
Some times the number will result in an impossible where clause other times it won't.
How can I use explain to examine how well optimized the query is?
MySQL
builds different query plans for different values of bound parameters.
In this article you can read the list of when does the MySQL
optimizer does what:
Action When Query parse PREPARE Negation elimination PREPARE Subquery re-writes PREPARE Nested JOIN simplification First EXECUTE OUTER->INNER JOIN conversions First EXECUTE Partition pruning Every EXECUTE COUNT/MIN/MAX elimination Every EXECUTE Constant subexpression removal Every EXECUTE Equality propagation Every EXECUTE Constant table detection Every EXECUTE ref access analysis Every EXECUTE range/index_merge analysis and optimization Every EXECUTE Join optimization Every EXECUTE
There is one more thing missing in this list.
MySQL
can rebuild a query plan on every JOIN
iteration: a such called range checking for each record
.
If you have a composite index on a table:
CREATE INDEX ix_table2_col1_col2 ON table2 (col1, col2)
and a query like this:
SELECT *
FROM table1 t1
JOIN table2 t2
ON t2.col1 = t1.value1
AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound
, MySQL
will NOT use an index RANGE
access from (t1.value1, t1.value2_lowerbound)
to (t1.value1, t1.value2_upperbound)
. Instead, it will use an index REF
access on (t1.value)
and just filter out the wrong values.
But if you rewrite the query like this:
SELECT *
FROM table1 t1
JOIN table2 t2
ON t2.col1 <= t1.value1
AND t2.col1 >= t2.value1
AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound
, then MySQL
will recheck index RANGE
access for each record from table1
, and decide whether to use RANGE
access on the fly.
You can read about it in these articles in my blog:
- Selecting timestamps for a time zone - how to use coarse filtering to filter out timestamps without a timezone
- Emulating SKIP SCAN - how to emulate
SKIP SCAN
access method inMySQL
- Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE - how to emulate Oracle's analytic functions in
MySQL
- Advanced row sampling - how to select
N
records from each group inMySQL
All these things employ RANGE CHECKING FOR EACH RECORD
Returning to your question: there is no way to tell which plan will MySQL
use for every given constant, since there is no plan before the constant is given.
Unfortunately, there is no way to force MySQL
to use one query plan for every value of a bound parameter.
You can control the JOIN
order and INDEX
'es being chosen by using STRAIGHT_JOIN
and FORCE INDEX
clauses, but they will not force a certain access path on an index or forbid the IMPOSSIBLE WHERE
.
On the other hand, for all JOIN
's, MySQL
employs only NESTED LOOPS
. That means that if you build right JOIN
order or choose right indexes, MySQL
will probably benefit from all IMPOSSIBLE WHERE
's.
OTHER TIPS
You are getting "Impossible WHERE noticed" because the value you specified is not in the column, not just because it is a constant. You could either 1) use a value that exists in the column or 2) just say col = col
:
explain select cols from tbl where col = col;
How do you optimize a query with values that are constant only to the query but where I, the programmer, don't known in advance what value will be used?
By using indexes on the specific columns (or even on combination of columns if you always query the given columns together). If you have indexes, the query planner will potentially use them.
Regarding "impossible" values: the query planner can conclude that a given value is not in the table from several sources:
- if there is an index on the particular column, it can observe that the particular value is large or smaller than any value in the index (min/max values take constant time to extract from indexes)
- if you are passing in the wrong type (if you are asking for a numeric column to be equal with a text)
PS. In general, creation of the query plan is not expensive and it is better to re-create than to re-use them, since the conditions might have changed since the query plan was generated and a better query plan might exists.