MariaDB 10.4 Query Optimizer Wrong Execution Plan

https://dba.stackexchange.com/questions/281399

12-03-2021
|

Question

I have a very strange behavior with some tables and the query optimizer on a MariaDB server.

First of all we have our main table let's call it huge and we have on the same database another table a clone of him with limited range rows.

HUGE table has a range

+--------------------+--------------------+
| min(dispatch_time) | max(dispatch_time) |
+--------------------+--------------------+
| 20070402114058     | 20201207000108     |
+--------------------+--------------------+

with count rows

+----------+
| count(*) |
+----------+
| 46683586 |
+----------+

and CLONE one has a range

+--------------------+--------------------+
| min(dispatch_time) | max(dispatch_time) |
+--------------------+--------------------+
| 20190101143607     | 20201207000108     |
+--------------------+--------------------+

with count rows

+----------+
| count(*) |
+----------+
| 10346027 |
+----------+

They have the same indexes

HUGE one

+-----------+------------+----------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name       | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table1 |          0 | PRIMARY        |            1 | order_id      | A         |    44735742 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          1 | Index_Customer |            1 | customer_id   | A         |    11183935 |     NULL | NULL   | YES  | BTREE      |         |               |
| table1 |          1 | Index_3        |            1 | dispatch_time | A         |    44735742 |     NULL | NULL   | YES  | BTREE      |         |               |
+-----------+------------+----------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

CLONE smaller one

+-----------+------------+----------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name       | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table1 |          0 | PRIMARY        |            1 | order_id      | A         |    10346027 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          1 | Index_Customer |            1 | customer_id   | A         |     2041159 |     NULL | NULL   | YES  | BTREE      |         |               |
| table1 |          1 | Index_3        |            1 | dispatch_time | A         |     8070853 |     NULL | NULL   | YES  | BTREE      |         |               |
+-----------+------------+----------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Now the problem is with this simple specific query.

On the HUGE one table if we run this

EXPLAIN SELECT * FROM   `table1` WHERE
     ( `dispatch_time` BETWEEN '20190201' AND '20190601' );
+------+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------+
| id   | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows    | Extra                 |
+------+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------+
|    1 | SIMPLE      | table1    | range | Index_3       | Index_3 | 15      | NULL | 5201896 | Using index condition |
+------+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------+


SELECT SQL_NO_CACHE * FROM   `table1` WHERE
     ( `dispatch_time` BETWEEN '20190201' AND '20190601' );
1695926 rows in set (21.730 sec)

So far so good. It uses a type range, using index condition, the result rows time is acceptable everything is fine.

BUT on the smaller one look what happens with the same query

EXPLAIN SELECT * FROM   `table1` WHERE
     ( `dispatch_time` BETWEEN '20190201' AND '20190601' );
+------+-------------+-----------+------+---------------+------+---------+------+----------+-------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+------+-------------+-----------+------+---------------+------+---------+------+----------+-------------+
|    1 | SIMPLE      | table1    | ALL  | Index_3       | NULL | NULL    | NULL | 10346027 | Using where |
+------+-------------+-----------+------+---------------+------+---------+------+----------+-------------+

SELECT SQL_NO_CACHE * FROM   `table1` WHERE
     ( `dispatch_time` BETWEEN '20190201' AND '20190601' );
1695926 rows in set (39.470 sec)

It does a full table scan, the type is ALL and is not using the Index.

The query optimizer does not choose the index because of the cost as i have did an optimizer_trace and this is the problematic part.

On the HUGE table it goes

{\
                        "index": "Index_3",\
                        "ranges": ["(20190201) <= (dispatch_time) <= (20190601)"],\
                        "rowid_ordered": false,\
                        "using_mrr": false,\
                        "index_only": false,\
                        "rows": 5201896,\
                        "cost": 6.51e6,\
                        "chosen": true\
}

on the CLONE smaller one

{\
                        "index": "Index_3",\
                        "ranges": ["(20190201) <= (dispatch_time) <= (20190601)"],\
                        "rowid_ordered": false,\
                        "using_mrr": false,\
                        "index_only": false,\
                        "rows": 3375750,\
                        "cost": 4.23e6,\
                        "chosen": false,\
                        "cause": "cost"\
}

I come to the conclusion that the cardinality of the CLONE table does not make the query optimizer to use the Index but the thing is Why?

Why to execute and go this way and do a full table scan on a smaller table although the index is there? How to tell the optimizer to change the plan? If you use force index, it uses the index and the result rows time is similar to HUGE table.

SELECT SQL_NO_CACHE * FROM   `table1` FORCE INDEX (Index_3) WHERE ( `dispatch_time` BETWEEN '20190201' AND '20190601' );

I have done multiple analyze table persistent for all on this table nothing changed. Tried various tweaks etc but always the execution plan does not use the Index condition on the small CLONE table.

Does anyone have an idea?

Thank you.

Edit post for:

SHOW CREATE TABLE

HUGE ONE

table1 | CREATE TABLE `table1` (
  `order_id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) DEFAULT NULL,
  `client_id` int(11) DEFAULT NULL,
  `table_id` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `codename` int(11) DEFAULT NULL,
  `start_time` char(14) DEFAULT NULL,
  `dispatch_time` char(14) DEFAULT NULL,
  `change_time` char(14) DEFAULT NULL,
  `buffet_time` char(14) DEFAULT NULL,
  `receipt_time` char(14) DEFAULT NULL,
  `delivery_time` char(14) DEFAULT NULL,
  `client_time` char(14) DEFAULT NULL,
  `return_time` char(14) DEFAULT NULL,
  `expected_time` char(14) DEFAULT NULL,
  `completion_time` char(14) DEFAULT NULL,
  `total` double DEFAULT NULL,
  `promotion` double DEFAULT NULL,
  `takeaway` int(11) DEFAULT NULL,
  `esan` int(11) DEFAULT NULL,
  `destroy` int(11) DEFAULT NULL,
  `person` int(11) DEFAULT NULL,
  `valid` int(11) DEFAULT NULL,
  `returned` int(11) DEFAULT NULL,
  `invoice` int(11) DEFAULT NULL,
  `discount` double DEFAULT NULL,
  `discountS` double DEFAULT NULL,
  `policy` int(11) DEFAULT NULL,
  `packing` int(11) DEFAULT NULL,
  `production` int(11) DEFAULT NULL,
  `vitrine` int(11) DEFAULT NULL,
  `sch_start` char(14) DEFAULT NULL,
  `sch_finish` char(14) DEFAULT NULL,
  `batch_time` char(14) DEFAULT NULL,
  `comments` varchar(255) DEFAULT NULL,
  `preorder` int(11) DEFAULT NULL,
  PRIMARY KEY (`order_id`),
  KEY `Index_Customer` (`customer_id`),
  KEY `Index_3` (`dispatch_time`)
) ENGINE=InnoDB AUTO_INCREMENT=46739244 DEFAULT CHARSET=greek COMMENT='InnoDB free: 12288 kB'

CLONE ONE

table1 | CREATE TABLE `table1` (
  `order_id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) DEFAULT NULL,
  `client_id` int(11) DEFAULT NULL,
  `table_id` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `codename` int(11) DEFAULT NULL,
  `start_time` char(14) DEFAULT NULL,
  `dispatch_time` char(14) DEFAULT NULL,
  `change_time` char(14) DEFAULT NULL,
  `buffet_time` char(14) DEFAULT NULL,
  `receipt_time` char(14) DEFAULT NULL,
  `delivery_time` char(14) DEFAULT NULL,
  `client_time` char(14) DEFAULT NULL,
  `return_time` char(14) DEFAULT NULL,
  `expected_time` char(14) DEFAULT NULL,
  `completion_time` char(14) DEFAULT NULL,
  `total` double DEFAULT NULL,
  `promotion` double DEFAULT NULL,
  `takeaway` int(11) DEFAULT NULL,
  `esan` int(11) DEFAULT NULL,
  `destroy` int(11) DEFAULT NULL,
  `person` int(11) DEFAULT NULL,
  `valid` int(11) DEFAULT NULL,
  `returned` int(11) DEFAULT NULL,
  `invoice` int(11) DEFAULT NULL,
  `discount` double DEFAULT NULL,
  `discountS` double DEFAULT NULL,
  `policy` int(11) DEFAULT NULL,
  `packing` int(11) DEFAULT NULL,
  `production` int(11) DEFAULT NULL,
  `vitrine` int(11) DEFAULT NULL,
  `sch_start` char(14) DEFAULT NULL,
  `sch_finish` char(14) DEFAULT NULL,
  `batch_time` char(14) DEFAULT NULL,
  `comments` varchar(255) DEFAULT NULL,
  `preorder` int(11) DEFAULT NULL,
  PRIMARY KEY (`order_id`),
  KEY `Index_Customer` (`customer_id`),
  KEY `Index_3` (`dispatch_time`)
) ENGINE=InnoDB AUTO_INCREMENT=46739244 DEFAULT CHARSET=greek COMMENT='InnoDB free: 12288 kB'

Solution

It's as you mentioned due to the cardinality difference between the two tables. It sounds like you understand what cardinality is on the surface, but to answer your question "why?" let me provide a little more information first.

So in short, cardinality is a measurement of uniqueness for a given value in a Table, in other words it measures the number of occurrences of a value relative to the total values in that Table. The SQL engine stores statistics about the cardinalities of every value in your Tables so it can make a decision on the most efficient way to serve that data later on when queried.

When you write a query, the predicates (values of the conditions in your WHERE and JOIN clauses) are used to filter the data of the Table based on those values. So the SQL engine uses its statistics for those particular values from your predicates to decide what kind of Execution Plan would be most performant, for example something with a low cardinality (low uniqueness, so high number of records contain that value) a full scan makes sense as opposed to an index seek (on a B-Tree) which would occur for a value with high cardinality (high uniqueness, not a lot of rows contain that value).

That being said, the direct answer to your question "why?" is because an index seek operation on a large amount of values (relatively speaking) is generally significantly slower of an operation as opposed to a full scan. Though the engine isn't exactly perfect and then this is why index hints exist, because you may know the data better than the engine does, and sometimes hinting against what it would normally want to do is the way to correct minor mistakes as such.

OTHER TIPS

It's not the "wrong" plan; it is the "right" plan for the different data. To explain:

BETWEEN '20190201' AND '20190601'

That is only about 5% of the bigger dataset, but about 30% of the smaller dataset.

The Optimizer has two ways to perform ranges like that -- use an index if the range is 'very' selective, or do a table scan, tossing the rows that don't apply.

When choosing between using an index versus doing a table scan, the cutoff is computed via a "cost" formula. But it boils down to a simple rule: "If more than about 20% of the table needs to be looked at, do a table scan."

Hence, when using that range against those two datasets, the Optimizer is likely to pick a different query plan.

The cutoff is imprecise and is sometimes far from optimal. To avoid that, here are two suggestions:

Adjust the query:

For testing, shrink the BETWEEN range to be a smaller percentage of the total so that the Optimizer will be likely to pick the same query plan

And/or adjust the what you build the subset:

Pull out a small percentage of customers (not dates) when building the 'test' dataset.
Or a subset containing only Mondays.
Or maybe a dataset with a fraction of dates and a fraction of customers.

Any of those can lead to anomalies like the one in question, so it might be better to build multiple datasets to test against -- and run tests against each.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange