Prohibit MySQL from using full table scan on a query

https://stackoverflow.com/questions/1617317

06-07-2019
|

Question

Is there any way I can prohibit MySQL from performing a full table scan when the result was not found using indexes?

For example this query:

SELECT *
FROM a
WHERE (X BETWEEN a.B AND a.C) 
ORDER BY a.B DESC 
LIMIT 1;

Is only efficient if X satisfies the condition and there is at least 1 row returned, but if the condition cannot be satisfied by any data in the table, full scan will be performed, which can be very costly.

I don't want to optimize this particular query, it is just an example.

EXPLAIN on this query with X in or outside of range:

id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a range long_ip  long_ip 8 \N 116183 100.00 Using where

STATUS VARIABLE show much better information. For X outside of range:

Handler_read_prev 84181
Key_read_requests 11047

In range:

Handler_read_key 1
Key_read_requests 12

If only there was a way to prevent Handler_read_prev from ever growing past 1.

UPDATE. I can't accept my own answer, because it doesn't really answer the question (HANDLER is a great feature, though). It seems to me that there is no general way to prevent MySQL from doing a full scan. Although, simple conditions like key='X' will be considered as "impossible where", more complex things like BETWEEN will not.

Solution 2

Here is what I came up with in the end:

HANDLER a OPEN;
HANDLER a READ BC <= (X);
HANDLER a CLOSE;

BC is the name of key (B,C). If we order the table by B DESC, then the result is guranteed to be equal to

SELECT *
FROM a
WHERE (X BETWEEN a.B AND a.C) 
ORDER BY a.B DESC 
LIMIT 1;

Now if X is not in the range of the table a, we just have to check that a.C is greater than X, if it's not, than X is definitely outside of the range and we don't need to look any further.

This is not very elegant though, and you will have to resort the table on each insert or update.

OTHER TIPS

You could write a "fully covered" subquery that only uses data that is available in indexes. Based on the returned primary key, you can look up the rows in the master table.

The following query is fully covered by indexes on (id), (B,id), and (C,id):

select *
from a
where id in (
    select id
    from a 
    where x <= C
    and id in (
        select id
        from a
        where B <= X 
    )
)
limit 1

Each SELECT uses one index: the innermost the index on (B,id); the middle SELECT uses the index on (C,id), and the outer SELECT uses the primary key.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow