Question

I have a table which grows by at least two million records per day on which I have to run stats on a daily basis. Since my stat queries can take upwards of three hours to run :O I'm trying to optimize the table somewhat. I thought I would utilize partitioning so that the query optimizer can take advantage of partition pruning, but when I run my queries all partitions are still being looked at.

I have created a test table, also available on mysql fiddle

CREATE TABLE `log_tests` (
  `_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `name` varchar(25) DEFAULT NULL,
  PRIMARY KEY (`_id`,`timestamp`),
  KEY `log_tests__timestamp` (`timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (unix_timestamp(`timestamp`))
(PARTITION p201401 VALUES LESS THAN (unix_timestamp('2014-02-01 00:00:00')) ENGINE = MyISAM,
 PARTITION pNew VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */
;

INSERT INTO `log_tests` (`timestamp`, `name`) VALUES ('2014-01-10 01:01:01', '1');
INSERT INTO `log_tests` (`timestamp`, `name`) VALUES ('2014-01-11 01:01:01', '2');
INSERT INTO `log_tests` (`name`) VALUES ('3');
INSERT INTO `log_tests` (`name`) VALUES ('4');
INSERT INTO `log_tests` (`name`) VALUES ('5');

Now... when I run a select statement with a where for a timeline before January 30th, both the partitions are looked at instead of just the p201401 partition. For example executing the following:

explain partitions select * from log_tests
where unix_timestamp(`timestamp`) < unix_timestamp('2014-01-31 00:00:00')

returns:

id | select_type | table     | partitions   | type | possible_keys | key  | key_len | ref  | rows | Extra
---------------------------------------------------------------------------------------------------------------
1  | SIMPLE      | log_tests | p201401,pNew | ALL  | NULL          | NULL | NULL    | NULL | 5    | Using where

Any words of wisdom???

Était-ce utile?

La solution

The problem is in how you do the query, the partitions work.

When you do

explain partitions select * from log_tests
where unix_timestamp(`timestamp`) < unix_timestamp('2014-01-31 00:00:00')

you are applying a function to a column value. Always when applying a function to a column MySQL is forced to do a full table scan as all rows need to apply that function to be able to evaluate the expression. It might be easier to understand it if you think of the function rand() instead, then it's obvious that each row has to be evaluated.

If you change your query to

explain partitions select * from log_tests
where timestamp < '2014-01-31 00:00:00';

it correctly uses only one partition. See this fiddle.

Btw, this holds true for all queries, not only those on partitioned tables. You should never apply functions to the column value, it will do a full table scan each time.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top