Why pt-duplicate-key-checker suggests removing a composite index?

https://dba.stackexchange.com/questions/204076

29-12-2020
|

Question

Here's a snippet from the output of the Percona Tool pt-duplicate-key-checker which searches for redundant indexes:

# Key myidx ends with a prefix of the clustered index
# Key definitions:
#   KEY `myidx` (`bar`,`foo`)
#   PRIMARY KEY (`foo`),
# Column types:
#         `bar` mediumint(8) unsigned not null default '0'
#         `foo` mediumint(8) unsigned not null auto_increment
# To shorten this duplicate clustered index, execute:
ALTER TABLE `mydb`.`mytable` DROP INDEX `myidx`, ADD INDEX `myidx` (`bar`);

Why does the tool suggest this? Can't the original composite index be useful?

As far as I understand, an index on bar would be worth of deletion given a PK (bar,foo), but it is not the case here.

Solution

The primary key is a part of any secondary key in InnoDB.

OTHER TIPS

I disagree with the analysis given by the tool.

When I see INDEX(bar, foo), I assume there is some query that needs those two columns, in that order, to be in this composite index.

The fact that foo is the PK, and INDEX(bar) is identical to the above index is irrelevant.

When I see just INDEX(bar), I assume there is some query that needs (bar) without id.

When I see both, I will say that the shorter one is 'redundant' and recommend removing it.

Furthermore, "To shorten this duplicate clustered index" is wrong. INDEX(bar) is no 'shorter' than INDEX(bar, foo). And it is not a "clustered index. Only the PK is "clustered".

If it were UNIQUE(bar, foo), then I would recommend changing UNIQUE to INDEX. This is so that INSERTs won't have to do an unnecessary uniqueness check.

Let's create a simple table and see what MySQL (5.7.20 MySQL Community Server) has to tell us:

mysql> create table test_dupe_key (foo int unsigned not null auto_increment primary key, bar int unsigned not null, random int unsigned not null default 0, key(bar, foo)) engine=InnoDB;
Query OK, 0 rows affected (0.08 sec)

mysql> show create table test_dupe_key\G
*************************** 1. row ***************************
       Table: test_dupe_key
Create Table: CREATE TABLE `test_dupe_key` (
  `foo` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `bar` int(10) unsigned NOT NULL,
  `random` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`foo`),
  KEY `bar` (`bar`,`foo`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> insert into test_dupe_key(bar) values (1), (2), (3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from test_dupe_key;
+-----+-----+--------+
| foo | bar | random |
+-----+-----+--------+
|   1 |   1 |      0 |
|   2 |   2 |      0 |
|   3 |   3 |      0 |
+-----+-----+--------+
3 rows in set (0.00 sec)

Here is a simple query that can be read from the index:

mysql> explain format=json select foo, bar from test_dupe_key\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1.60"
    },
    "table": {
      "table_name": "test_dupe_key",
      "access_type": "index",
      "key": "bar",
      "used_key_parts": [
        "bar",
        "foo"
      ],
      "key_length": "8",
      "rows_examined_per_scan": 3,
      "rows_produced_per_join": 3,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.60",
        "prefix_cost": "1.60",
        "data_read_per_join": "48"
      },
      "used_columns": [
        "foo",
        "bar"
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

This shows that the key bar is used and has 2 key parts, both being read to produce a key length of 8 (2x4). Now let's switch the composite key for a single field and check again:

mysql> alter table test_dupe_key drop key bar, add key(bar);
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain format=json select foo, bar from test_dupe_key\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1.60"
    },
    "table": {
      "table_name": "test_dupe_key",
      "access_type": "index",
      "key": "bar",
      "used_key_parts": [
        "bar"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 3,
      "rows_produced_per_join": 3,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.60",
        "prefix_cost": "1.60",
        "data_read_per_join": "48"
      },
      "used_columns": [
        "foo",
        "bar"
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

Only the key parts and length now change, which is as expected, but it is still an index read. If we turn this into a range query on the PK field with a constraint on the indexed field let's see what happens:

mysql> explain format=json select foo, bar from test_dupe_key where bar = 1 and foo > 0\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1.20"
    },
    "table": {
      "table_name": "test_dupe_key",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY",
        "bar"
      ],
      "key": "bar",
      "used_key_parts": [
        "bar",
        "foo"
      ],
      "key_length": "8",
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.20",
        "prefix_cost": "1.20",
        "data_read_per_join": "16"
      },
      "used_columns": [
        "foo",
        "bar"
      ],
      "attached_condition": "((`stack_204076`.`test_dupe_key`.`bar` = 1) and (`stack_204076`.`test_dupe_key`.`foo` > 0))"
    }
  }
}
1 row in set, 1 warning (0.00 sec)

The query planner has considered the PK for the query, but chosen the index for bar, which now has an interesting change to the previous index read as we can see that it now shows as 2 key parts again and a length of 8:

  "key": "bar",
  "used_key_parts": [
    "bar",
    "foo"
  ],
  "key_length": "8"

This tells us that MySQL has accessed the PK that is automatically contained in the secondary index

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange