How do we make this query fast in MySQL 5.5 (group by + order by multiple columns with 2 left joins)?

StackOverflow https://stackoverflow.com/questions/22433503

Question

Table type: InnoDB MySQL 5.5 (Debian 7.0)

We have this query:

SELECT SQL_NO_CACHE t.*, count(ul.user_id) AS like_count FROM post p
force index (added_utc_date_and_time_sort_idx)
LEFT OUTER JOIN user_post_liked ul ON p.id = ul.post_id
LEFT OUTER JOIN user_post u ON u.post_id = p.id
GROUP BY p.id
ORDER BY p.added_utc_date ASC, p.added_utc_time ASC, p.hash ASC LIMIT 0,10;

There are indexes added_utc_date_and_time_sort_idx(p.added_utc_date, p.added_utc_time, p.hash), primary(post.id)

The EXPLAIN shows a temp table and file sort, with the query taking about 4 seconds with only 20K rows/200MB of data (very slow as we will have 2 million+ rows == 400+ seconds query time):

id, select_type, table, type, possible_keys,    key,              key_len, ref,  rows,  Extra

1,  SIMPLE,      p,     ALL,  NULL,             NULL,             NULL,    NULL, 24576, "Using temporary; Using filesort"

1,  SIMPLE,      ul,    ref,  PRIMARY,          PRIMARY,          764,     posta.p.id,1,"Using index"

1,  SIMPLE,      u,     ref,  fk_user_post_idx, fk_user_post_idx, 764,     posta.p.id,1,"Using index"

Now what we want is for mysql to use the index to order the rows instead of doing a filesort since we are only reading the first 10 results.

Was it helpful?

Solution

Your problem is the combination of GROUP BY with ORDER BY. There is no single index that would cover both!

To make it work without sorting, you'd need one index that satisfies the order required for GROUP BY (p.id) as as well as the order required for the ORDER BY (p.added_utc_date ASC, p.added_utc_time ASC, p.hash ASC). These two order requirements don't share a common prefix, hence you cannot have a single index that supports both.

If, however, you could do with ORDER BY p.id, p.added_utc_date, ... then you could create an index for that and it should work for both (still I don't know if MySQL is smart enough to do it!).

To make a long story short: the execution plan you are thinking of is impossible for your query.

References:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top