MySQL index is not used when using SELECT ALL for binary lower

https://stackoverflow.com/questions/22181614

05-06-2023
|

Question

I have a keywords table like this:

+---------+--------------+------+-----+---------+----------------+
| Field   | Type         | Null | Key | Default | Extra          |
+---------+--------------+------+-----+---------+----------------+
| id      | int(11)      | NO   | PRI | NULL    | auto_increment |
| name    | varchar(255) | YES  | MUL | NULL    |                |
| country | varchar(2)   | YES  |     | NULL    |                |
+---------+--------------+------+-----+---------+----------------+

And I have compound index on [name, country]:

+----------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table    | Non_unique | Key_name                           | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+----------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| keywords |          0 | PRIMARY                            |            1 | id          | A         |      377729 |     NULL | NULL   |      | BTREE      |
| keywords |          1 | index_keywords_on_name_and_country |            1 | name        | A         |      377729 |     NULL | NULL   | YES  | BTREE      |
| keywords |          1 | index_keywords_on_name_and_country |            2 | country     | A         |      377729 |     NULL | NULL   | YES  | BTREE      |
+----------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

I need to use BINARY LOWER to compare the name field, so my query will be like this:

SELECT keywords.* FROM `keywords` WHERE (BINARY LOWER(`name`) = BINARY LOWER('Apple') AND `country` = 'US');

But the problem is: it's not using the index. Using the Explain I have:

+------+-------------+----------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+----------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | keywords | ALL  | NULL          | NULL | NULL    | NULL | 366519 | Using where |
+------+-------------+----------+------+---------------+------+---------+------+--------+-------------+

However, instead of select *, if I select some fields, it will then use index:

Explain SELECT keywords.id, keywords.name FROM `keywords` WHERE (BINARY LOWER(`name`) = BINARY LOWER('Apple') AND `country` = 'US');

+------+-------------+----------+-------+---------------+------------------------------------+---------+------+--------+--------------------------+
| id   | select_type | table    | type  | possible_keys | key                                | key_len | ref  | rows   | Extra                    |
+------+-------------+----------+-------+---------------+------------------------------------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | keywords | index | NULL          | index_keywords_on_name_and_country | 777     | NULL | 366519 | Using where; Using index |
+------+-------------+----------+-------+---------------+------------------------------------+---------+------+--------+--------------------------+

I'm using MySQL 5.5.

Any reason why this happens?

And is there a way I can use the index on my query? Or how can change my query and table in order to use the index to speed up the query.

Thanks

Solution

Why do you need to do the comparison to binary lower()? This seems like a very odd requirement for keywords.

In any case, you could do this with subqueries:

SELECT k.*
FROM (SELECT k.*
      FROM `keywords` k
      WHERE name = 'Apple' and country = 'US'
     ) k
WHERE (BINARY LOWER(`name`) = BINARY LOWER('Apple') AND `country` = 'US');

The inner subquery should use the index. The resulting scan should be on a small subset, so it should be fast.

OTHER TIPS

Yes, changing the character set (and collation) spoils the use of an index. The optimizer can't rely on the collation you specify alphabetizing strings in the same way they're stored in the index, so therefore it doesn't use the index.

If you use a case-insensitive COLLATION, you don't have to do this BINARY LOWER expression at all.

mysql> select 'apple' = 'Apple';
+-------------------+
| 'apple' = 'Apple' |
+-------------------+
|                 1 |
+-------------------+

The "ci" suffix in collations indicates case-insensitivity.

mysql> show session variables like 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+

So just do the simple string comparison (provided you have set the collation order for this table to a ci collation):

SELECT keywords.* FROM `keywords` WHERE `name` = 'Apple' AND `country` = 'US');

Re your comment:

Comparing accented characters depends on the character set and collation.

mysql> SELECT 'Lé' = 'le';

mysql> SET NAMES latin1 COLLATE latin1_general_ci;

mysql> select 'lé' = 'Lé';
+---------------+
| 'lé' = 'Lé'   |
+---------------+
|             1 |
+---------------+

mysql> select 'lé' = 'Le';
+--------------+
| 'lé' = 'Le'  |
+--------------+
|            0 |
+--------------+

I can't find a Unicode collation in MySQL that treat accented characters as different, but preserves case-insensitivity.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow