Question

I've got a table such as:

CREATE TABLE `order` (
    `id` bigint(10) unsigned NOT NULL,
    `second_id` bigint(10) unsigned NOT NULL,
    `timestamp` bigint(10) unsigned NOT NULL,
    `country` char(2) DEFAULT NULL,
    `qty1` int(10) unsigned NOT NULL,
    `qty2` int(10) unsigned NOT NULL,
    PRIMARY KEY (`id`),
    KEY `timestamp_second_id_country` (`timestamp`,`second_id`,`country`),
    KEY `timestamp_second_id` (`timestamp`,`second_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

With has one row per second_id, per timestamp, per country.

I need a query that gets the quantities qty1 and qty2 for a certain second_id within a certain timeframe (ignoring the country), so something like:

SELECT timestamp, SUM(qty1) AS qty1, SUM(qty2) AS qty2
FROM order 
WHERE second_id = "<ID>" 
AND timestamp >= <min date>
AND timestamp < <max date>
GROUP BY timestamp 
ORDER BY timestamp DESC

Since the table contains about 12 million rows, this query takes ages (round 25 seconds), so I added the timestamp_second_id KEY to fix that, but unfortunately that doesn't seem to do it... Well, almost.

This is this query's EXPLAIN:

+----+-------------+-------+-------+-------------------------------------------------+---------------------+---------+------+----------+-------------+
| id | select_type | table | type  | possible_keys                                   | key                 | key_len | ref  | rows     | Extra       |
+----+-------------+-------+-------+-------------------------------------------------+---------------------+---------+------+----------+-------------+
|  1 | SIMPLE      | order | index | timestamp_second_id_country,timestamp_second_id | timestamp_second_id | 16      | NULL | 12185418 | Using where |
+----+-------------+-------+-------+-------------------------------------------------+---------------------+---------+------+----------+-------------+

So it all looks good, it finds the 2 possible keys, uses the correct one, but it's still super slow... The funny part appears when I use FORCE INDEX (timestamp_second_id), the EXPLAIN then becomes:

+----+-------------+-------+-------+-------------------------------------------------+---------------------+---------+------+---------+-----------------------+
| id | select_type | table | type  | possible_keys                                   | key                 | key_len | ref  | rows    | Extra                 |
+----+-------------+-------+-------+-------------------------------------------------+---------------------+---------+------+---------+-----------------------+
|  1 | SIMPLE      | order | range | timestamp_second_id_country,timestamp_second_id | timestamp_second_id | 16      | NULL | 3465998 | Using index condition |
+----+-------------+-------+-------+-------------------------------------------------+---------------------+---------+------+---------+-----------------------+

So basically, it uses the same INDEX as before, but now the "Extra" uses an "index condition" and the query is quite fast (~1 sec).

So my question... Why isn't the query as fast without the FORCE INDEX as with it, even though looking at the EXPLAIN the same INDEX is being used? Is there any way to get this done without forcing the INDEX?

(Note that I also tried another queries, like an INNER query for grouping the timestamps, inside another query which selects the second_id and timestamp via a WHERE).

Was it helpful?

Solution

Indexes can only be used for searches up through the first range.

So, put your equality columns first, followed by one range column. Don't put any columns that aren't filtered on in the WHERE clause (actually, you can add additional columns for a covering index after the ones used in the filter, though they won't be used in the search).

So, given your SELECT:

SELECT timestamp, SUM(qty1) AS qty1, SUM(qty2) AS qty2
FROM order 
WHERE second_id = "<ID>" 
AND timestamp >= <min date>
AND timestamp < <max date>
GROUP BY timestamp 
ORDER BY timestamp DESC

Your index should be on the following columns, in order:

(second_id, timestamp)

Since the search condition on second_id is =, and timestamp is a "range".

For a covering index, it would be:

(second_id, timestamp, qty1, qty2)

OTHER TIPS

SELECT timestamp, SUM(qty1) AS qty1, SUM(qty2) AS qty2
FROM order 
WHERE second_id = "<ID>" 
AND timestamp >= <min date>
AND timestamp < <max date>
GROUP BY timestamp 
ORDER BY timestamp DESC;

KEY covering_index (second_id,timestamp,qty1,qty2)

Try with this INDEX combination. This is called covering index.


See Covering Index and Loose index scan for reference
Generally speaking if we have ranges inside our queries covering indexes are fast


http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top