How can incorporating partitions increases the number of rows swept?

https://stackoverflow.com/questions/22904172

28-06-2023
|

Domanda

I've a table with 4.2+M records and I wondered if partitioning could help me increasing the performance of my queries so I conducted a test. Having appropriate indices defined, I duplicated the table and just then I partitioned the second one. So now I have two identical tables; one without partitions and the other with.

Here is the structure of my table (simplified):

CREATE TABLE `cse` (
  `id` bigint(20) unsigned NOT NULL,
  `type` varchar(45) DEFAULT NULL,
  `name` varchar(1000) NOT NULL,
  `dt` datetime NOT NULL,
  PRIMARY KEY (`id`,`dt`),
  KEY `inx1` (`type`),
  KEY `inx2` (`type`,`dt`),
  KEY `inx3` (`dt`,`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

And here's how I partition the duplicate:

ALTER TABLE cse_p PARTITION BY RANGE COLUMNS (dt) (
    PARTITION p11_09 VALUES LESS THAN ('2011-09-01'),
    PARTITION p11_10 VALUES LESS THAN ('2011-10-01'),
    PARTITION p11_11 VALUES LESS THAN ('2011-11-01'),
    PARTITION p11_12 VALUES LESS THAN ('2011-12-01'),
    PARTITION p12_01 VALUES LESS THAN ('2012-01-01'),
    PARTITION p12_02 VALUES LESS THAN ('2012-02-01'),
    PARTITION p12_03 VALUES LESS THAN ('2012-03-01'),
    PARTITION p12_04 VALUES LESS THAN ('2012-04-01'),
    PARTITION p12_05 VALUES LESS THAN ('2012-05-01'),
    PARTITION p12_06 VALUES LESS THAN ('2012-06-01'),
    PARTITION p12_07 VALUES LESS THAN ('2012-07-01'),
    PARTITION p12_08 VALUES LESS THAN ('2012-08-01'),
    PARTITION p12_09 VALUES LESS THAN ('2012-09-01'),
    PARTITION p12_10 VALUES LESS THAN ('2012-10-01'),
    PARTITION p12_11 VALUES LESS THAN ('2012-11-01'),
    PARTITION p12_12 VALUES LESS THAN ('2012-12-01'),
    PARTITION p13_01 VALUES LESS THAN ('2013-01-01'),
    PARTITION p13_02 VALUES LESS THAN ('2013-02-01'),
    PARTITION p13_03 VALUES LESS THAN ('2013-03-01'),
    PARTITION p13_04 VALUES LESS THAN ('2013-04-01'),
    PARTITION p13_05 VALUES LESS THAN ('2013-05-01'),
    PARTITION p13_06 VALUES LESS THAN ('2013-06-01'),
    PARTITION p13_07 VALUES LESS THAN ('2013-07-01'),
    PARTITION p13_08 VALUES LESS THAN ('2013-08-01'),
    PARTITION p13_09 VALUES LESS THAN ('2013-09-01'),
    PARTITION p13_10 VALUES LESS THAN ('2013-10-01'),
    PARTITION p13_11 VALUES LESS THAN ('2013-11-01'),
    PARTITION p13_12 VALUES LESS THAN ('2013-12-01'),
    PARTITION p_rest VALUES LESS THAN (MAXVALUE)
);

And here's the cardinality of each partition (I know!):

SELECT PARTITION_NAME, TABLE_ROWS
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'cse_p';
+----------------+------------+
| PARTITION_NAME | TABLE_ROWS |
+----------------+------------+
| p11_09         |    1030353 |
| p11_10         |     577326 |
| p11_11         |          0 |
| p11_12         |          0 |
| p12_01         |          0 |
| p12_02         |          0 |
| p12_03         |     601575 |
| p12_04         |     766727 |
| p12_05         |     855438 |
| p12_06         |     262869 |
| p12_07         |          0 |
| p12_08         |          0 |
| p12_09         |          0 |
| p12_10         |          0 |
| p12_11         |          0 |
| p12_12         |          0 |
| p13_01         |          0 |
| p13_02         |          0 |
| p13_03         |          0 |
| p13_04         |          0 |
| p13_05         |          0 |
| p13_06         |          0 |
| p13_07         |          0 |
| p13_08         |          0 |
| p13_09         |          0 |
| p13_10         |          0 |
| p13_11         |          0 |
| p13_12         |          0 |
| p_rest         |          0 |
+----------------+------------+

Having set the stem, I tested the two tables' performance with the following query:

EXPLAIN PARTITIONS
SELECT DATE(dt), name, COUNT(*) AS count
FROM cse
WHERE (type = 'A' OR type = 'B' OR type = 'C')
AND dt > '2012-04-01'
AND dt < '2012-05-01'
GROUP BY DATE(dt), name;

Here's the output of above query on the two tables:

cse

+----+-------------+-------+------------+-------+----------------+------+---------+------+------+--------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key  | key_len | ref  | rows | Extra                                                  |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+--------------------------------------------------------+
|  1 | SIMPLE      | cse   | NULL       | range | inx1,inx2,inx3 | inx2 | 143     | NULL | 4919 | Using index condition; Using temporary; Using filesort |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+--------------------------------------------------------+

cse_p

+----+-------------+-------+------------+-------+----------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key  | key_len | ref  | rows | Extra                                        |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | cse_p | p12_05     | range | inx1,inx2,inx3 | inx2 | 143     | NULL | 7736 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------------+-------+----------------+------+---------+------+------+----------------------------------------------+

The question (finally)

Why introducing partitions increased the number of rows swept while in both cases the same index is used?

[UPDATE]

As I've forgot to mention the version of MySQL, it's 5.6.16-1+sury.org~precise+1 - (Ubuntu).

Soluzione

It looks like the non-partitioned table is using ICP. As noted in the linked documentation, MySQL 5.6 and below don't support ICP on partitioned tables. The issue is resolved in 5.7. Can you confirm your MySQL version?

That should resolve the question itself, but more importantly there are other things that can be done with this query.

A triple column index on type, dt, and name may improve performance dramatically. The real killer in MySQL is usually the filesort. Getting rid of filesorts can often help more than reducing the row count.

Actually, if you don't need the results sorted you might be able to improve the performance by adding an ORDER BY NULL to the end of the query. GROUP BY in MySQL implicitly orders by the same fields as the group, which sometimes kills performance if you don't need it. Telling it you don't care about order (if you really don't) may prevent the filesort.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow