Question

I am trying to understand performance of an SQL query using MySQL. With only indexes on the PK, the query failed to complete in over 10mins. I have added indexes on all the columns used in the where clauses (timestamp, hostname, path, type) and the query now completes in approx 50seconds -- however this still seems a long time for what does not seem an overly complex query.

So, I'd like to understand what it is about the query that is causing this. My assumption is that my inner subquery is in someway causing an explosion in the number of comparisons necessary.

There are two tables involved:

storage (~5,000 rows / 4.6MB ) and machines (12 rows, <4k)

The query is as follows:

SELECT T.hostname, T.path, T.used_pct, 
      T.used_gb, T.avail_gb, T.timestamp, machines.type AS type
      FROM storage AS T
      JOIN machines ON T.hostname = machines.hostname
      WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
                            WHERE st.hostname = T.hostname AND
                                              st.path = T.path)
      AND (machines.type = 'nfs')
      ORDER BY used_pct DESC

An EXPLAIN EXTENDED for the query returns the following:

id       select_type        table     type       possible_keys        key          key_len    ref                            rows     filtered      Extra
1        PRIMARY            machines  ref        hostname,type        type         768        const                          1        100.00        Using where; Using temporary; Using filesort
1        PRIMARY            T         ref        fk_hostname          fk_hostname  768        monitoring.machines.hostname   4535     100.00        Using where
2        DEPENDENT SUBQUERY st        ref        fk_hostname,path     path         1002       monitoring.T.path              648      100.00        Using where

Noticing that the 'extra' column for Row 1 includes 'using filesort' and question: MySQL explain Query understanding states that "Using filesort is a sorting algorithm where MySQL isn't able to use an index for sorting and therefore can't do the complete sort in memory."

What is the nature of this query which is causing slow performance?

Why is it necessary for MySQL to use 'filesort' for this query?

Was it helpful?

Solution

Indexes don't get populated, they are there as soon as you create them. That's why inserts and updates become slower the more indexes you have on a table.

Your query runs fast after the first time because the whole result of the query is put into cache. To see how fast the query is without using the cache you can do

SELECT SQL_NO_CACHE T.hostname ...

MySQL uses filesort usually for ORDER BY or in your case to determine the maximum value for timestamp. Instead of going through all possible values and memorizing which value is the greatest, MySQL sorts the values descending and picks the first one.

So, why is your query slow? Two things jumped into my eye.

1) Your subquery

  WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
                        WHERE st.hostname = T.hostname AND
                                          st.path = T.path)

gets evaluated for every (hostname, path). Have a try with an index on timestamp (btw, I discourage naming columns like keywords / datatypes). If that alone doesn't help, try to rewrite your query. There are two excellent examples in the MySQL manual: The Rows Holding the Group-wise Maximum of a Certain Column.

2) This is a minor issue, but it seems you are joining on char/varchar fields. Numbers / IDs are much faster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top