Question

We are trying to move data from one table that currently has over 1B rows to a partitioned table. But as we started to do so we notice that half of the partitions are empty, why is that? Table is partitioned by the column that has UUIDv4 values.

Our table definition:

CREATE TABLE table1 (
  `col1` varchar(36) NOT NULL DEFAULT '',
  `col2` varchar(256) NOT NULL DEFAULT '',
  `col3` datetime NOT NULL DEFAULT NOW(),
  UNIQUE KEY `unq_col1_col2 (`col1`,`col2`),
  KEY `idx_col3` (`col3`)
)
ENGINE=InnoDB
PARTITION BY KEY(col1)
PARTITIONS 100;

INFORMATION_SCHEMA.PARTITIONS output:

PARTITION_NAME  TABLE_ROWS
p0  136532
p1  0
p2  339523
p3  0
p4  156955
p5  0
p6  305700
p7  0
p8  209548
p9  0
p10 329291
p11 0
p12 152995
p13 0
p14 388600
p15 0
p16 251903
p17 0
p18 364532
p19 0
p20 200799
p21 0
......
p89 0
p90 303628
p91 0
p92 215546
p93 0
p94 399165
p95 0
p96 210364
p97 0
p98 318675
p99 0

We are using Google Cloud SQL (MySQL) which is currently on version 5.7.32-google-log.

Was it helpful?

Solution

You have 100 partitions. So it's likely that your hash values modulus 100 are all even numbers.

This is a general risk of hashing. If the hashing function produces only values with a pattern like this, you can get imbalanced partitions.

One way to fix this is to use a prime number for the number of partitions. For example 101 is a prime number close to your desired number of partitions.

This helps because if the hash values modulus 100 produced a pattern, that pattern will be offset by 1 per window of 100 values. Thus it will be spread more evenly over the partitions.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top