Question

I am trying to optimize a query on a mysql table I've created. I expect that there will be many many rows in the table. Looking at this question the accepted answer and the top voted answer suggests two different approaches. I wrote these two queries and want to know which one is more performant.

SELECT uv.* 
FROM UserVisit uv INNER JOIN 
    (SELECT ID,MAX(visitDate) visitDate 
     FROM UserVisit GROUP BY ID) last 
    ON (uv.ID = last.ID AND uv.visitDate = last.visitDate);

Running this with EXPLAIN yields:

+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                            | rows | Extra       |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL                           |    2 |             |
|  1 | PRIMARY     | uv         | eq_ref | PRIMARY       | PRIMARY | 11      | last.playscanID,last.visitDate |    1 |             |
|  2 | DERIVED     | UserVisit  | index  | NULL          | PRIMARY | 11      | NULL                           |    4 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
3 rows in set (0.01 sec)

And the other query:

SELECT lastVisits.* 
FROM ( SELECT * FROM UserVisit ORDER BY visitDate DESC ) lastVisits 
GROUP BY lastVisits.ID

Running that with EXPLAIN yields:

+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using temporary; Using filesort |
|  2 | DERIVED     | UserVisit  | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using filesort                  |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)

I am uncertain how to interpret the result of the two EXPLAINs.
Which of these queries can I expect to be faster and why?

EDIT: This is the way UserVisit table looks:

+----------------+---------------------+------+-----+---------+-------+
| Field          | Type                | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| ID             | bigint(20) unsigned | NO   | PRI | NULL    |       |
| visitDate      | date                | NO   | PRI | NULL    |       |
| visitTime      | time                | NO   |     | NULL    |       |
| analysisResult | decimal(3,2)        | NO   |     | NULL    |       |
+----------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Was it helpful?

Solution

Firstly, you might want to read the manual on EXPLAIN. It's a dense read, but it should provide most of the information you want.

Secondly, as Strawberry says, the second query works by accident. The behaviour may change in future versions, and your query would not return an error, just different data. That's nearly always a bad thing.

Finally, the EXPLAIN suggests that version 1 will be faster. In EXTRA, it's saying it's using an index, which is much faster than filesort. Without a schema, it's hard to be sure, but I think you will also benefit from a compound key on ID and visitdate.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top