How to optimize this select with join and order-by?
-
08-10-2020 - |
题
We have two tables:
CREATE TABLE `messages` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created` int(10) unsigned DEFAULT '0',
`user_id` int(11) DEFAULT '0',
....
`subject_id` int(11) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
KEY `user_id` (`user_id`),
KEY `created` (`created`),
KEY `text_id` (`text_id`) USING BTREE,
KEY `subject_id` (`subject_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=237542180 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
The second one:
CREATE TABLE `users` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`email` char(150) DEFAULT NULL,
`reg_time` int(10) unsigned DEFAULT '0',
`password` char(255) DEFAULT NULL,
...................
`moderation` int(1) unsigned NOT NULL DEFAULT '0',
`tag` varchar(255) DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
UNIQUE KEY `email` (`email`),
KEY `created` (`reg_time`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=123585 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
messages has ~49M records, users has 13k. DB engine: Aurora(MySQL-Compatible) 5.6.10a
The terribly long request is
SELECT messages.*, users.administrator_group_id FROM messages
LEFT JOIN users ON messages.user_id = users.id
ORDER BY messages.id desc LIMIT 0,20
If I run this request without order by
then it takes 14-16 sec. With order
it takes longer than 5 min.
I considering change business logic to avoid this request and limit recordset from messages
e.g. by messages date but would like to know if there is any way to speed it up on the same hardware as is.
解决方案
I have never used Aurora and there might differences with MySQL but there is a method that works very often in MySQL in similar issues, when the execution plan is not optimal, i.e. when it does the joins first and then has to do the ORDER BY
of the big intermediate result set.
Instead of joining the 2 tables, we try to first LIMIT
the results in a derived table and then JOIN
back. This way indexes will be used for the ORDER BY - LIMIT
and then it will only have to do N seeks (20 in this case) in the 2nd table:
SELECT
m.*,
u.administrator_group_id
FROM
( SELECT id
FROM messages
ORDER BY id DESC
LIMIT 20
) AS mi
JOIN
messages AS m ON m.id = mi.id
LEFT JOIN
users AS u ON m.user_id = u.id
ORDER BY
mi.id DESC ;
And a variation:
SELECT
m.*,
u.administrator_group_id
FROM
( SELECT mi.*
FROM messages AS mi
ORDER BY mi.id DESC
LIMIT 20
) AS m
LEFT JOIN
users AS u ON m.user_id = u.id
ORDER BY
m.id DESC ;
Try both and check execution plan and performance. In any reasonable hardware a query that just gets 20 rows from a table or two and uses indexes should be really efficient. In milliseconds range, not seconds or minutes.