Question

I have been trying to create an index in MySQL, but keep getting temporary and filesort whenever I run an explain on my query.

A simplified version of my tables looks like:

ordered_products
    op_id INT UNSIGNED NOT NULL AUTO_INCREMENT
    op_orderid INT UNSIGNED NOT NULL
    op_orderdate TIMESTAMP NOT NULL
    op_productid INT UNSIGNED NOT NULL

products
    p_id INT UNSIGNED NOT NULL AUTO_INCREMENT
    p_productname VARCHAR(128) NOT NULL
    p_enabled TINYINT NOT NULL

The 'ordered_products' table currently has more than 1,000,000 rows and is a record of all products that have been ordered, as well as the orders that they belong to. This table grows rapidly.

The 'products' table currently has around 3,000 rows and contains a list of products that are for sale.

The site displays a list of the top products for a given period (normally the last 3 days) and my query looks like:

SELECT COUNT(op.op_productid) AS ProductCount, op.op_productid
FROM ordered_products op
LEFT JOIN products p ON op.op_productid=p.p_id
WHERE op.op_orderdate>='2014-03-08 00:00:00'
AND p.p_enabled=1
GROUP BY op.op_productid
ORDER BY ProductCount DESC, p.p_productname ASC

When I run that query, it normally takes around 800 milliseconds (0.8 seconds) to execute, which is ridiculous. We've remedied this with caching, however whenever the cache expires, we have a slowdown. I need to fix this.

I have tried to index the tables, but no matter what I try, I can't avoid temporary and filesort. The output from EXPLAIN is:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  p   index   PRIMARY,idx_enabled_id_name idx_enabled_id_name 782 \N  1477    Using where; Using index; Using temporary; Using filesort
1   SIMPLE  op  ref idx_pid_oid_date    idx_pid_oid_date    4   test_store.p.p_id   9   Using where; Using index

If I remove the GROUP BY, the filesort disappears, however I need it to ensure the ProductCount value shows me every product count rather than a total sum of all products.

If I remove the GROUP BY and the ORDER BY ProductCount, both temporary and filesort disappear, but now I am left with a very bad result set.

Can anyone please help me solve this? I have tried a multitude of different indexes, and have tried rewriting the SQL numerous times, but can never succeed.

Any help would be greatly appreciated.

Was it helpful?

Solution

You can't get rid of the temp table and filesort while you are using ORDER BY on a calculated column ProductCount. There's no index for the calculated column, so it has to do do the sorting at the time of the query.

I tried experimentally to reproduce your results. I can put an index on op_productid and then the optimizer might use it to perform the GROUP BY.

mysql> EXPLAIN SELECT COUNT(op.op_productid) AS ProductCount, op.op_productid 
FROM ordered_products op FORCE INDEX (op_productid) STRAIGHT_JOIN products p 
  ON op.op_productid=p.p_id 
WHERE op.op_orderdate>='2014-03-08 00:00:00' AND p.p_enabled=1 
GROUP BY op.op_productid ORDER BY null;

In my case, I had to use STRAIGHT_JOIN and FORCE INDEX to override the optimizer. But that might be due to my test environment, where I have only 1 or 2 rows per table for testing, and it throws off the optimizer's choices. In your real data, it might make a more sensible choice.

Also, don't use LEFT JOIN if you have conditions in the WHERE clause that make the join implicitly an inner join. Learn the types of joins and how they work -- don't always use LEFT JOIN by default.

+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key          | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
|  1 | SIMPLE      | op    | index | op_productid  | op_productid | 4       | NULL |    5 | Using where |
|  1 | SIMPLE      | p     | ALL   | PRIMARY       | NULL         | NULL    | NULL |    1 | Using where |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+

Your only alternative is to store a denormalized table, where the counts are persisted. Then if your cache fails, it isn't an expensive query to refresh the cache.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top