Question

I have the following table in MySQL:

CREATE TABLE `ParaTable` (
    `id_1` INT(10) UNSIGNED NULL DEFAULT '0',
    `id_2` INT(10) UNSIGNED NULL DEFAULT '0',
    `id_3` TINYINT(3) UNSIGNED NULL DEFAULT '0',
    `id_4` TINYINT(3) UNSIGNED NULL DEFAULT '0',
    `id_5` INT(10) UNSIGNED NULL DEFAULT '0',
    `date` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
    INDEX `id_1` (`id_1`),
    INDEX `id_2` (`id_2`),
    INDEX `date` (`date`),
    INDEX `id_3` (`id_3`),
    INDEX `id_4` (`id_4`),
    INDEX `id_5` (`id_5`),
    INDEX `multi_index` (`id_1`, `id_3`, `id_4`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;

It has a total number of about 70,000,000 entries, even though the columns are nullable, none of the entries has a NULL in any column (the table structure is not the question here).

If I look into information_schema, I can see that the index length is 10272899072 and the data length is 3201302528.

This makes a total of 12,850MB, or about 12.54GB.

How is this number calculated?

The output of SHOW TABLE STATUS ... LIKE ParaTable shows:

Rows: 68129609
Avg_row_length: 47
Data_length: 3201302528       (=3053MB)
Index_length: 10272899072     (=9797MB)

I read about data storage sizes in the MySQL manual and have made the following rough calculation:

(int+int+tinyint+tinyint+int+timestamp)

4+4+1+1+4+4 = 18 bytes per row (+ 6 bit, because each column is nullable, I assume that I can calculate these 6 bit as just another byte and be safe, see the manual) = 19 bytes per row.

(Even if each of the 6 bits takes 1 byte on the disk, which is unlikely I guess, that would be 24 bytes per row.)

18 bytes * 70,000,000 rows = 1260000000B = ~1200MB
(19 bytes * 70,000,000 rows = 1330000000B = ~1270MB)
(24 bytes * 70,000,000 rows = 1680000000B = ~1600MB)

I do not know how much space mysql takes up for indexes (I can only take the value from SHOW TABLES but how is it really calculated?). This is kind of a missing link in the calculation for total size needed. But even if indexes do not have anything to do with it, the Data_length alone seems way too high.

Why is the Avg_row_length 47 instead of my calculated 18-24 bytes? What am I missing here?

Was it helpful?

Solution

You've missed calculating all of InnoDB's overhead for storing these rows. You should have:

  4 (INT)
+ 4 (INT)
+ 1 (TINYINT)
+ 1 (TINYINT)
+ 4 (INT)
+ 4 (TIMESTAMP)
+ 1 (Null bitmap, rounded up to whole bytes)
+ 5 (Row header)
+ 6 (ROW_ID: Implicit cluster key, because you are missing a PRIMARY KEY)
+ 6 (TRX_ID: Transaction ID)
+ 7 (ROLL_PTR: Rollback/undo pointer)
= 43 bytes per row

Then you also need to account for page fill rates (pages aren't filled to 100% by design) which adds ~7% at an absolute minimum:

  43
* 1 / (15/16)
= 45.86 bytes per row

Additionally you will have overhead in allocated but unused space.

So actually, getting ~47 bytes per row is not bad at all. The worst case would be for overhead to consume ~50% causing the table to take ~86 bytes per row.

For each of your secondary keys, note that their space consumption will look like (to use id_1 as an example):

  4 (INT)
+ 1 (Null bitmap, rounded up to whole bytes)
+ 5 (Row header)
+ 6 (ROW_ID: Implicit cluster key)
= 16 bytes per row
* 1 / (15/16)
= 17.06 bytes per row

It may be useful to read the following posts about InnoDB data structures to learn more:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top