I've a table with 4.2+M records and I wondered if partitioning could help me increasing the performance of my queries so I conducted a test. Having appropriate indices defined, I duplicated the table and just then I partitioned the second one. So now I have two identical tables; one without partitions and the other with.
Here is the structure of my table (simplified):
CREATE TABLE `cse` (
`id` bigint(20) unsigned NOT NULL,
`type` varchar(45) DEFAULT NULL,
`name` varchar(1000) NOT NULL,
`dt` datetime NOT NULL,
PRIMARY KEY (`id`,`dt`),
KEY `inx1` (`type`),
KEY `inx2` (`type`,`dt`),
KEY `inx3` (`dt`,`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And here's how I partition the duplicate:
ALTER TABLE cse_p PARTITION BY RANGE COLUMNS (dt) (
PARTITION p11_09 VALUES LESS THAN ('2011-09-01'),
PARTITION p11_10 VALUES LESS THAN ('2011-10-01'),
PARTITION p11_11 VALUES LESS THAN ('2011-11-01'),
PARTITION p11_12 VALUES LESS THAN ('2011-12-01'),
PARTITION p12_01 VALUES LESS THAN ('2012-01-01'),
PARTITION p12_02 VALUES LESS THAN ('2012-02-01'),
PARTITION p12_03 VALUES LESS THAN ('2012-03-01'),
PARTITION p12_04 VALUES LESS THAN ('2012-04-01'),
PARTITION p12_05 VALUES LESS THAN ('2012-05-01'),
PARTITION p12_06 VALUES LESS THAN ('2012-06-01'),
PARTITION p12_07 VALUES LESS THAN ('2012-07-01'),
PARTITION p12_08 VALUES LESS THAN ('2012-08-01'),
PARTITION p12_09 VALUES LESS THAN ('2012-09-01'),
PARTITION p12_10 VALUES LESS THAN ('2012-10-01'),
PARTITION p12_11 VALUES LESS THAN ('2012-11-01'),
PARTITION p12_12 VALUES LESS THAN ('2012-12-01'),
PARTITION p13_01 VALUES LESS THAN ('2013-01-01'),
PARTITION p13_02 VALUES LESS THAN ('2013-02-01'),
PARTITION p13_03 VALUES LESS THAN ('2013-03-01'),
PARTITION p13_04 VALUES LESS THAN ('2013-04-01'),
PARTITION p13_05 VALUES LESS THAN ('2013-05-01'),
PARTITION p13_06 VALUES LESS THAN ('2013-06-01'),
PARTITION p13_07 VALUES LESS THAN ('2013-07-01'),
PARTITION p13_08 VALUES LESS THAN ('2013-08-01'),
PARTITION p13_09 VALUES LESS THAN ('2013-09-01'),
PARTITION p13_10 VALUES LESS THAN ('2013-10-01'),
PARTITION p13_11 VALUES LESS THAN ('2013-11-01'),
PARTITION p13_12 VALUES LESS THAN ('2013-12-01'),
PARTITION p_rest VALUES LESS THAN (MAXVALUE)
);
And here's the cardinality of each partition (I know!):
SELECT PARTITION_NAME, TABLE_ROWS
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'cse_p';
+----------------+------------+
| PARTITION_NAME | TABLE_ROWS |
+----------------+------------+
| p11_09 | 1030353 |
| p11_10 | 577326 |
| p11_11 | 0 |
| p11_12 | 0 |
| p12_01 | 0 |
| p12_02 | 0 |
| p12_03 | 601575 |
| p12_04 | 766727 |
| p12_05 | 855438 |
| p12_06 | 262869 |
| p12_07 | 0 |
| p12_08 | 0 |
| p12_09 | 0 |
| p12_10 | 0 |
| p12_11 | 0 |
| p12_12 | 0 |
| p13_01 | 0 |
| p13_02 | 0 |
| p13_03 | 0 |
| p13_04 | 0 |
| p13_05 | 0 |
| p13_06 | 0 |
| p13_07 | 0 |
| p13_08 | 0 |
| p13_09 | 0 |
| p13_10 | 0 |
| p13_11 | 0 |
| p13_12 | 0 |
| p_rest | 0 |
+----------------+------------+
Having set the stem, I tested the two tables' performance with the following query:
EXPLAIN PARTITIONS
SELECT DATE(dt), name, COUNT(*) AS count
FROM cse
WHERE (type = 'A' OR type = 'B' OR type = 'C')
AND dt > '2012-04-01'
AND dt < '2012-05-01'
GROUP BY DATE(dt), name;
Here's the output of above query on the two tables:
cse
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+--------------------------------------------------------+
| 1 | SIMPLE | cse | NULL | range | inx1,inx2,inx3 | inx2 | 143 | NULL | 4919 | Using index condition; Using temporary; Using filesort |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+--------------------------------------------------------+
cse_p
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+----------------------------------------------+
| 1 | SIMPLE | cse_p | p12_05 | range | inx1,inx2,inx3 | inx2 | 143 | NULL | 7736 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+----------------------------------------------+
The question (finally)
Why introducing partitions increased the number of rows swept while in both cases the same index is used?
[UPDATE]
As I've forgot to mention the version of MySQL, it's 5.6.16-1+sury.org~precise+1 - (Ubuntu)
.