Pergunta

I have a question about this query:

SELECT * 
  FROM runs 
 WHERE (NOW() BETWEEN began_at 
                  AND finished_at)

Do you think it makes sense to create composite index for began_at and finished_at columns? Or it makes sense to create index only for began_at?

Foi útil?

Solução

Your style is very uncommon.

Most people would probably write WHERE began_at < NOW() AND finished_at > NOW()

However. I would recommend putting an index on both fields.

A combined key wont be of use to you because you it would only speed up searcher for specific date combinations.

Well this is not entirely true because if you use betree a combined key will help you but not as good as if you index them seperately. Combined keys are very good if you search combinations of fields with equality (=) operator. SIngle field indexes perform better in ragen requests.

You can google a bit for "multidimensional range search".

The reason is that all matching fields in one field can be basically found in log(n) time in btrees. So your overall runtime will be O(k*log(n)) which is O(log(n)).

Multidimensional Range queries have a runtime of O(sqrt(n)) which is higher. However there are better implementations as well which also acheav logarithmic runtime. However they are not fully implemented in mysql, so it will be worse or awful depending on the version.

So let me sum up:

  • Equality comparisions on single fields: hash index (runtime O(1))

  • Range search on single fields: btree index on single fields ( O(log(n)) )

  • Equality search on multiple fields: combined hash key (runtime O(1))

those cases are a clear thing...

  • Range search on multiple fields: seperate btree indexes ( O(log(n)) )

this is where its not so clear. with current versions its clearly better to index seperately because of the reasons given above. With a perfect implementation for that use case you could achieve better performance with combined keys but there is no system in know of which supports it. mysql supports loose indexes (which you need for that) since version 5.0, but only very limited and the query optimizer only utilizes them in rare cases afaik. don't know about newer versions like 5.3 or something.

however with mysql implementing loose indexes combined keys on fields where you do range requests or sorting in different directions become more and more relevant.

Outras dicas

Due to the use of inequalities, and not equalities, a composite index isn't going to do any much better (if not worse) than two individual indexes.

I'd advocate for leaning towards two individual indexes on both began_at and finished_at.

References for Loose index scan:

http://www.mysqlperformanceblog.com/2006/05/09/descending-indexing-and-loose-index-scan/

http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html

The "Index Merge" strategy could come into play from MySQL 5 onwards: http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html - which also suggests that separate indexes might be better.

However, I have never been able to get it to work for me :)

The correct answer to this sort of question is always: "it depends". Try it both ways and benchmark and compare execution plans. Note that the answer also changes as the amount of data in your tables changes and the query workload changes. Indexes on an evolving system are almost never fire and forget.

Good question, but I'd actually start with indexes on both began_at and finished_at, because one reasonable way you might rewrite this query is like so

SELECT *
  FROM runs
 WHERE began_at < NOW()
   AND finished_at > NOW()

Which makes it a bit more clear (to me) that each column needs its own index.

I've never gotten MySQL to use use two individual indices for begin_at or finished_at when trying to optimize this kind of query. Apparently, other people are saying that a composite index won't work either, so it may just not be possible to get MySQL to optimize this query using an index.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top