Question

I have a 3 large tables (10k, 10k, and 100M rows) and am trying to do a simple count on a join of them, where all the joined columns are indexed. Why does the COUNT(*) take so long, and how can I speed it up (without triggers and a running summary)?

mysql> describe SELECT COUNT(*) FROM `metaward_alias` INNER JOIN `metaward_achiever` ON (`metaward_alias`.`id` = `metaward_achiever`.`alias_id`) INNER JOIN `metaward_award` ON (`metaward_achiever`.`award_id` = `metaward_award`.`id`) WHERE `metaward_award`.`owner_id` = 8;
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
| id | select_type | table             | type   | possible_keys                                         | key                        | key_len | ref                             | rows | Extra       |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | metaward_award    | ref    | PRIMARY,metaward_award_owner_id                       | metaward_award_owner_id    | 4       | const                           | 1552 |             | 
|  1 | SIMPLE      | metaward_achiever | ref    | metaward_achiever_award_id,metaward_achiever_alias_id | metaward_achiever_award_id | 4       | paul.metaward_award.id          | 2498 |             | 
|  1 | SIMPLE      | metaward_alias    | eq_ref | PRIMARY                                               | PRIMARY                    | 4       | paul.metaward_achiever.alias_id |    1 | Using index | 
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
3 rows in set (0.00 sec)

But actually running the query takes about 10 minutes, and I'm on MyISAM so the tables are fully locked down for that duration

Was it helpful?

Solution

I guess the reason is that you do a huge join over three tables (without applying where clause first, the result would be 10k * 10k * 100M = 1016 rows). Try to reorder joins (for example start with metaward_award, then join only metaward_achiever see how long that takes, then try to plug metaward_alias, possibly using subquery to force your preferred evaluation order).

If that does not help you might have to denormalize your data, for example by storing number of aliases for particular metaward_achiever. Then you'd get rid of one join altogether. Maybe you can even cache the sums for metaward_award, depending on how and how often is your data updated.

Other thing that might help is getting all your database content into RAM :-)

OTHER TIPS

Make sure you have indexes on:

metaward_alias      id
metaward_achiever   alias_id
metaward_achiever   award_id
metaward_award      id
metaward_award      owner_id

I'm sure many people will also suggest to count on a specific column, but in MySql this doesn't make any difference for your query.

UPDATE:

You could also try to set the condition on the main table instead of one of the joined tables. That would give you the same result, but it could be faster (I don't know how clever MySql is):

SELECT COUNT(*) FROM `metaward_award` 
   INNER JOIN `metaward_achiever` 
      ON (`metaward_achiever`.`award_id` = `metaward_award`.`id`) 
   INNER JOIN `metaward_alias` 
      ON (`metaward_alias`.`id` = `metaward_achiever`.`alias_id`) 
WHERE `metaward_award`.`owner_id` = 8

10 minutes is way too long for that query. I think you must have a really small key cache. You can get its size in bytes with:

SELECT @@key_buffer_size

First off, you should run ANALYZE TABLE or OPTIMIZE TABLE. They'll sort your index and can slightly improve the performance.

You should also see if you can use more compact types for your columns. For instance, if you're not going to have more than 16 millions owners or awards or aliases, you can change your INT columns into MEDIUMINT (UNSIGNED, of course). Perhaps even SMALLINT in some cases? That will reduce your index footprint and you'll fit more of it in the cache.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top