MariaDB wants to use bizarre and extremely inefficient query plan on Aria, but not on InnoDB (5 seconds vs 1+ hour query)

dba.stackexchange https://dba.stackexchange.com/questions/273897

Question

I'm having some troubles getting a query to preform as expected on my MariaDB 10.4.14 system.

We recently migrated the database from InnoDB to Aria so we could store the INDEX separately on a fast PCI-e SSD to benefit from increased seeking performance in indexes.

While Aria so far has been a major performance increase, there is just one query that absolutely does not want to co-operate.

This query will run as efficient as ever after running an ANALYSE command on the table, but as its receiving active inserts it takes about 10 minutes for the query planner to lose its grip again.

We did not have these issues on InnoDB.

Anyways, the query is as follows:

SELECT 
  data.*, 
  urls.url, 
  file.timestamp, 
  source.location 
FROM data JOIN urls ON data.urlid = urls.id 
JOIN file ON data.fileid = file.id 
JOIN source ON data.sourceid = source.id  
WHERE urlid IN (SELECT id FROM urls WHERE vurl 
LIKE REVERSE('%.example.com')) 
ORDER BY timestamp DESC;

What I expect it to look like:

+------+-------------+--------+--------+-----------------------+---------+---------+--------------------+------+-----------------------------------------------------------+
| id   | select_type | table  | type   | possible_keys         | key     | key_len | ref                | rows | Extra                                                     |
+------+-------------+--------+--------+-----------------------+---------+---------+--------------------+------+-----------------------------------------------------------+
|    1 | PRIMARY     | urls   | range  | PRIMARY,vurl          | vurl    | 765     | NULL               | 4    | Using where; Using index; Using temporary; Using filesort |
|    1 | PRIMARY     | urls   | eq_ref | PRIMARY               | PRIMARY | 8       | data.urls.id       | 1    |                                                           |
|    1 | PRIMARY     | data   | ref    | sourceid,urlid,fileid | urlid   | 8       | data.urls.id       | 4    | Using where                                               |
|    1 | PRIMARY     | file   | ref    | PRIMARY               | PRIMARY | 8       | data.data.fileid   | 1    |                                                           |
|    1 | PRIMARY     | source | eq_ref | PRIMARY               | PRIMARY | 2       | data.data.sourceid | 1    |                                                           |
+------+-------------+--------+--------+-----------------------+---------+---------+--------------------+------+-----------------------------------------------------------+

What it does look like (without ANALYSE):

+------+-------------+--------+--------+-----------------------+----------+---------+------------------+---------+---------------------------------+
| id   | select_type | table  | type   | possible_keys         | key      | key_len | ref              | rows    | Extra                           |
+------+-------------+--------+--------+-----------------------+----------+---------+------------------+---------+---------------------------------+
|    1 | PRIMARY     | source | ALL    | PRIMARY               | NULL     | NULL    | NULL             | 15      | Using temporary; Using filesort |
|    1 | PRIMARY     | data   | ref    | sourceid,urlid,fileid | sourceid | 2       | data.source.id   | 197803  | Using where                     |
|    1 | PRIMARY     | urls   | eq_ref | PRIMARY               | PRIMARY  | 8       | data.data.urlid  | 1       |                                 |
|    1 | PRIMARY     | urls   | eq_ref | PRIMARY,vurl          | PRIMARY  | 8       | data.data.urlid  | 1       | Using where                     |
|    1 | PRIMARY     | file   | ref    | PRIMARY               | PRIMARY  | 8       | data.data.fileid | 7907384 |                                 |
+------+-------------+--------+--------+-----------------------+----------+---------+------------------+---------+---------------------------------+

Again, this behaviour only showed once we stated using Aria and didn't present on the InnoDB table we migrated from. I'm more then happy with a FORCE INDEX or IGNORE INDEX kinda fix, but due to the complexity of the query I'm having some trouble finding the right commands to use.

Could anyone help me out with this? Any idea what could cause this? (Is this a bug?)

EDIT: The table layout is as follows:

CREATE TABLE `data` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `entryid` INT(10) UNSIGNED NOT NULL DEFAULT 0,
    `sourceid` SMALLINT(5) UNSIGNED NOT NULL DEFAULT 1,
    `urlid` BIGINT(20) UNSIGNED NOT NULL DEFAULT 1,
    `fileid` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `entryid_sourceid_urlid` (`entryid`, `sourceid`, `urlid`),
    INDEX `entryid` (`entryid`),
    INDEX `sourceid` (`sourceid`),
    INDEX `urlid` (`urlid`),
    INDEX `fileid` (`fileid`)
)
COLLATE='utf8_bin'
ENGINE=InnoDB;
 
CREATE TABLE `file` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `file` BLOB NOT NULL DEFAULT '',
    `hash` BINARY(28) NOT NULL DEFAULT '\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0' COMMENT 'SHA-224 hash of file in binary',
    `timestamp` INT(10) UNSIGNED NULL DEFAULT NULL,
    `last` INT(10) UNSIGNED NOT NULL,
    PRIMARY KEY (`id`, `timestamp`),
    INDEX `hash` (`hash`(5)) USING HASH,
    INDEX `timestamp` (`timestamp`),
    INDEX `last` (`last`)
)
COLLATE='utf8_bin'
ENGINE=Aria
 PARTITION BY RANGE (`timestamp`)
(PARTITION `p2017` VALUES LESS THAN (1514761200) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2018` VALUES LESS THAN (1546297200) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2019` VALUES LESS THAN (1577833200) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2020` VALUES LESS THAN (1609455600) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2021` VALUES LESS THAN (1640991600) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2022` VALUES LESS THAN (1672527600) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2023` VALUES LESS THAN (1704063600) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2024` VALUES LESS THAN (1735686000) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2025` VALUES LESS THAN (1767222000) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2026` VALUES LESS THAN (1798758000) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2027` VALUES LESS THAN (1830294000) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2028` VALUES LESS THAN (1861916400) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2029` VALUES LESS THAN (1893452400) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2030` VALUES LESS THAN (1924988400) DATA DIRECTORY = '/data/mysql' ENGINE = Aria,
 PARTITION `p2031` VALUES LESS THAN (1956524400) DATA DIRECTORY = '/data/mysql' ENGINE = Aria);
 
CREATE TABLE `source` (
    `id` SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
    `source` TINYTEXT NOT NULL DEFAULT '' COLLATE 'utf8_bin',
    `last` INT(10) UNSIGNED NULL DEFAULT 0 COMMENT 'Last received item from this source',
    `updated` TIMESTAMP NULL DEFAULT current_timestamp() ON UPDATE current_timestamp() COMMENT 'Last time the last value was updated',
    PRIMARY KEY (`id`),
    UNIQUE INDEX `source` (`source`) USING HASH
)
COLLATE='utf8_bin'
ENGINE=InnoDB;
 
CREATE TABLE `urls` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `url` VARCHAR(254) NOT NULL COLLATE 'utf8_bin',
    `vurl` VARCHAR(254) AS (reverse(`url`)) VIRTUAL,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `url` (`url`),
    INDEX `vurl` (`vurl`)
)
COLLATE='utf8_bin'
ENGINE=InnoDB;
Was it helpful?

Solution

I resolved this issue by changing my query to use STRAIGHT_JOIN on the file and source table:

SELECT 
  data.*, 
  urls.url, 
  file.timestamp, 
  source.location 
FROM data JOIN urls ON data.urlid = urls.id 
STRAIGHT_JOIN file ON data.fileid = file.id 
STRAIGHT_JOIN source ON data.sourceid = source.id  
WHERE urlid IN (SELECT id FROM urls WHERE vurl 
LIKE REVERSE('%.example.com')) 
ORDER BY timestamp DESC;

I found that there is not a whole lot of people talking about STRAIGHT_JOIN's out there, so I initially missed it when searching.

In the MySQL Reference, the following is said about it:

STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.

On the MariaDB knowledge base, there is an article which mentions it: https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/

I highly recommend giving it a read, as this page describes some methods for forcing different query plans.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top