Question

I have a table with several million records on MySQL in an MyISAM table. Very simplified, it's like this:

CREATE TABLE `test` (
  `text` varchar(5) DEFAULT NULL,
  `number` int(5) DEFAULT NULL,
  KEY `number` (`number`) USING BTREE,
  KEY `text_number` (`text`,`number`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

It's populated with this data:

INSERT INTO `test` VALUES ('abcd', '1');
INSERT INTO `test` VALUES ('abcd', '2');
INSERT INTO `test` VALUES ('abcd', '3');
INSERT INTO `test` VALUES ('abcd', '4');
INSERT INTO `test` VALUES ('bbbb', '1');
INSERT INTO `test` VALUES ('bbbb', '2');
INSERT INTO `test` VALUES ('bbbb', '3');

When I run the following query:

EXPLAIN SELECT * FROM `test` WHERE (`text` = 'bbbb' AND `number` = 2)

It returns 'number' as the key to use. But the following query:

EXPLAIN SELECT * FROM `test` WHERE (`text` = 'bbbb' AND `number` = 1)

Returns 'text_number' as key to use, which would make more sense to me as this combined key matches exactly with the 2 columns in the WHERE. On these amount of records the performance isn't an issue, but on several million records the query which uses the 'text' index takes 4 seconds, and the one that uses 'text_number' index is finished in several milliseconds.

Is there a logical explaination for this? How can I change the index that MySQL uses the index? I know I can use USE INDEX but I want MySQL to be able to find the best plan to execute the query. This is on MySQL 5.1 and 5.5, same results.

Was it helpful?

Solution

If you expect some of the queries to be game changer for the performance it is good idea to use you own indexes, like:

EXPLAIN SELECT * FROM `test` USE INDEX (number) WHERE (`text` = 'bbbb' AND `number` = 1);
or
EXPLAIN SELECT * FROM `test` USE INDEX (text_number) WHERE (`text` = 'bbbb' AND `number` = 1);

Generally, you can relay on the built-in query optimizer for most of the queries, but for the crucial or problematic ones it is good idea to look closer.

OTHER TIPS

Two things that could improve your performance:

  1. ANALYZE TABLE to update your optimizer's index statistics so it knows the current key distribution. You can control the optimizer's statistics generation in a number of ways.
  2. Create a second covering index with the fields reversed. In the real world, this involves figuring out the optimal one or two covering indexes for a set of three or four columns, and that's something you arrive at through benchmarking and EXPLAIN analysis. Have a look at this answer for some advice on doing that.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top