Question

I'm trying to optimize a report query, as most of report queries this one incorporates aggregation. Since the size of table is considerable and growing, I need to tend to its performance.

For example, I have a table with three columns: id, name, action. And I would like to count the number of actions each name has done:

SELECT name, COUNT(id) AS count
FROM tbl
GROUP BY name;

As simple as it gets, I can't run it in a acceptable time. It might take 30 seconds and there's no index, whatsoever, I can add which is taken into account, nevertheless improves it.

When I run EXPLAIN on the above query, it never uses any of indices of the table, i.e. an index on name.

Is there any way to improve the performance of aggregation? Why the index is not used?

[UPDATE]

Here's the EXPLAIN's output:

+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
|  1 | SIMPLE      | tbl   | ALL  | NULL          | NULL | NULL    | NULL | 4025567 |   100.00 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+

And here is the table's schema:

CREATE TABLE `tbl` (
  `id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `name` varchar(1000) NOT NULL,
  `action` int unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `inx` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Was it helpful?

Solution

The problem with your query and use of indexes is that you refer to two different columns in your SELECT statement yet only have one column in your indexes, plus the use of a prefix on the index.

Try this (refer to just the name column):

SELECT name, COUNT(*) AS count
FROM tbl
GROUP BY name;

With the following index (no prefix):

tbl (name)

Don't use a prefix on the index for this query because if you do, MySQL won't be able to use it as a covering index (will still have to hit the table).

If you use the above, MySQL will scan through the index on the name column, but won't have to scan the actual table data. You should see USING INDEX in the explain result.

This is as fast as MySQL will be able to accomplish such a task. The alternative is to store the aggregate result separately and keep it updated as your data changes.

Also, consider reducing the size of the name column, especially if you're hitting index size limits, which you most likely are hence why you're using the prefix. Save some room by not using UTF8 if you don't need it (UTF8 is 3 bytes per character for index).

OTHER TIPS

It's a very common question and key for solution lies in fact, that your table is growing.

So, first way would be: to create index by name column if it isn't created yet. But: this will solve your issue for a time.

More proper approach would be: to create separate statistics table like

tbl_counts
+------+-------+
| name | count |
+------+-------+

And store your counts separately. When changing (insert/update or delete) your data in tbl table - you'll need to adjust corresponding row inside tbl_counts table. This way allows you to get rid of performing COUNT query at all - but will need to add some logic inside tbl table.

To maintain integrity of your statistics table you can either use triggers or do that inside application. This method is good if performance of COUNT query is much more important for you than your data changing queries (but overhead from changing tbl_counts table won't be too high)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top