Pergunta

I'm trying to improve my query so that it doesn't take so long. Is there anything I can try?

I'm using InnoDB.

My table:

mysql> describe hunted_place_review_external_urls;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| worker_id    | varchar(255) | YES  | MUL | NULL    |                |
| queued_at    | bigint(20)   | YES  | MUL | NULL    |                |
| external_url | varchar(255) | NO   |     | NULL    |                |
| place_id     | varchar(63)  | NO   | MUL | NULL    |                |
| source_id    | varchar(63)  | NO   |     | NULL    |                |
| successful   | tinyint(1)   | NO   |     | 0       |                |
+--------------+--------------+------+-----+---------+----------------+

My query:

mysql> select * from hunted_place_review_external_urls where worker_id is null order by queued_at asc limit 1;

1 row in set (4.00 sec)

mysql> select count(*) from hunted_place_review_external_urls where worker_id is null;
+----------+
| count(*) |
+----------+
|    19121 |
+----------+
1 row in set (0.00 sec)

Why is it taking 4s even though I have an index on queued_at and worker_id?

Here's the EXPLAIN of this query:

mysql> explain select * from hunted_place_review_external_urls where worker_id is null order by queued_at asc limit 1;
+----+-------------+-----------------------------------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table                             | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
+----+-------------+-----------------------------------+-------+---------------+-----------+---------+------+------+-------------+
|  1 | SIMPLE      | hunted_place_review_external_urls | index | worker_id     | queued_at | 9       | NULL |   67 | Using where |
+----+-------------+-----------------------------------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.00 sec)

It becomes much faster when I remove the order by queued_at part:

mysql> select * from hunted_place_review_external_urls where worker_id is null limit 1;

1 row in set (0.00 sec)

It also becomes much faster when the count(*) is smaller:

mysql> select count(*) from hunted_place_review_external_urls where worker_id is null;
+----------+
| count(*) |
+----------+
|    10    |
+----------+
1 row in set (0.00 sec)

mysql> select * from hunted_place_review_external_urls where worker_id is null order by queued_at asc limit 1;

1 row in set (0.00 sec)

My queued_at values are timestamps expressed in number of milliseconds, such as 1398210069531

Foi útil?

Solução 2

From the docs:

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

...snip...

The key used to fetch the rows is not the same as the one used in the ORDER BY:

SELECT * FROM t1 WHERE key2=constant ORDER BY key1;

And:

With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column.

Your query plan confirms that your slow query is using the queued_at key. If you remove the ORDER BY, the query plan should use the worker_id key instead. One possible reason for the difference in speed is the difference in which key is being used.

As Peter Zaitsev says in MySQL Performance Blog: ORDER BY ... LIMIT Performance Optimization:

It is very important to have ORDER BY with LIMIT executed without scanning and sorting full result set, so it is important for it to use index...

For example if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10; I would use index on (date_created) to get result set very fast.

Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;

In this case index by date_created may also work but it might not be the most efficient – If it is rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be better idea.

You could try, per this suggestion, creating a composite index (worker_id, queued_at) for use with this specific query. If for some reason you can't add another index, you could also try forcing your ordered query to use the worker_id index, to narrow the result set before sorting.

It would be great if you could rewrite this query so that you could find the single row you want without the ORDER BY, since MySQL will order the result before applying LIMIT 1. But not knowing more about your broad goals here, I can't say whether that would be possible. What about splitting the task into the following two queries?

select MIN(queued_at) from hunted_place_review_external_urls where worker_id is null into @var;

select * from hunted_place_review_external_urls where worker_id is null and queued_at = @var;

Or as a subquery, if you don't have issues with duplicate values?

select * from hunted_place_review_external_urls where queued_at in (select MIN(queued_at) from hunted_place_review_external_urls where worker_id is null);

Outras dicas

MySQL is using the queued_at index to avoid a "Using filesort" operation. It appears that MySQL is looking at every single row in the table, and that's taking four seconds.

MySQL is using the index to get the row with the lowest value of queued_at first, then visiting the underlying data page to check whether worker_id is NULL or not. MySQL works through the index, from the lowest value of queued_at up through the highest value.

For every matching row found, MySQL adds that row to the resultset.

Note that the LIMIT clause doesn't get applied until after all the matching rows are found and the result set is prepared. (There's no "early out" when the first matching row is found, MySQL still chugs through every one of the rows to find every last one of them. But at least, MySQL is avoiding what could be an expensive Using filesort operation to get the rows ordered.)

Your other queries exhibit better performance because they have different access plans, which likely use indexes to limit the number of rows that need to be checked.


To improve performance of this particular query, you could try adding an index:

... ON hunted_place_review_external_urls (worker_id, queued_at);

If that's not an option, you could attempt to influence the optimizer to use a different index, with an index hint:

  select * 
    from hunted_place_review_external_urls USING INDEX `worker_id`
   where worker_id is null 
   order by queued_at asc
   limit 1;

Note that the USING INDEX hint references the name of the index, not the name of the column. From the EXPLAIN output, it appears there is an index named "worker_id". I'm going to guess that this index is on the column named "worker_id", but that's just a guess.


As an aside, this doesn't have anything to do with the queued_at column being defined as a BIGINT vs an INT or SMALLINT or VARCHAR.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top