Question

is there any way how to optimize next query:

EXPLAIN EXTENDED SELECT keyword_id, ck.keyword, COUNT( article_id ) AS cnt
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id ) 
WHERE keyword_id
IN (

SELECT keyword_id
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id ) 
WHERE article_id
IN (

SELECT article_id
FROM career_article_keyword
WHERE keyword_id =9
)
AND keyword_id <>9
)
GROUP BY keyword_id
ORDER BY cnt DESC

The main task here if I have particular keyword_id (CURRENT_KID) i need to find all keywords which was ever belongs to any article together with CURRENT_KID, and sort result based on quantity of usage these keywords

tables defined as:

mysql> show create table career_article_keyword;
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                  | Create Table                                                                                                                                                                                                                                                                                                                                               |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_article_keyword | CREATE TABLE `career_article_keyword` (
  `article_id` int(11) unsigned NOT NULL,
  `keyword_id` int(11) NOT NULL,
  UNIQUE KEY `article_id` (`article_id`,`keyword_id`),
  CONSTRAINT `career_article_keyword_ibfk_1` FOREIGN KEY (`article_id`) REFERENCES `career` (`menu_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table career_keywords;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                                                                         |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_keywords | CREATE TABLE `career_keywords` (
  `keyword_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` varchar(250) NOT NULL,
  PRIMARY KEY (`keyword_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

output of "explain" is scared me

http://o7.no/J6ThIs

on big data this query can kill everything :) can i make it faster somehow ?

thanks.

Was it helpful?

Solution

Looking at your EXPLAIN output, I was concerned that your use of subqueries had resulted in a suboptimal use of indexes. I felt (without any justification - and on this I may very well be wrong) that rewriting using JOIN might lead to a more optimised query.

To do that, we need to understand what it is your query is intended to do. It would have helped if your question had articulated it, but after a little head-scratching I decided your query was trying to fetch a list of all other keywords that appear in any article that contains some given keyword, together with a count of all articles in which those keywords appear.

Now let's rebuild the query in stages:

  1. Fetch "any article that contains some given keyword" (not worrying about duplicates):

    SELECT ca2.article_id
    FROM
           career_article_keyword AS ca2
    WHERE
          ca2.keyword_id = 9;
    
  2. Fetch "all other keywords that appear in [the above]"

    SELECT ca1.keyword_id
    FROM
           career_article_keyword AS ca1
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ca1.keyword_id;
    
  3. Fetch "[the above], together with a count of all articles in which those keywords appear"

    SELECT ca1.keyword_id, COUNT(DISTINCT ca0.article_id) AS cnt
    FROM
           career_article_keyword AS ca0
      JOIN career_article_keyword AS ca1 USING (keyword_id)
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ca1.keyword_id
    ORDER BY cnt DESC;
    
  4. Finally, we want to add to the output the matching keyword itself from the career_keyword table:

    SELECT ck.keyword_id, ck.keyword, COUNT(DISTINCT ca0.article_id) AS cnt
    FROM
           career_keywords        AS ck 
      JOIN career_article_keyword AS ca0 USING (keyword_id)
      JOIN career_article_keyword AS ca1 USING (keyword_id)
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ck.keyword_id -- equal to ca1.keyword_id due to join conditions
    ORDER BY cnt DESC;
    

One thing that is immediately clear is that your original query referenced career_keywords twice, whereas this rewritten query references that table only once; this alone might explain the performance difference - try removing the second reference to it (i.e. where it appears in your first subquery), as it's entirely redundant there.

Looking back over this query, we can see that joins are being performed on the following columns:

  • career_keywords.keyword_id in ck JOIN ca0

    This table defines PRIMARY KEY (`keyword_id`), so there is a good index which can be used for this join.

  • career_article_keyword.article_id in ca1 JOIN ca2

    This table defines UNIQUE KEY `article_id` (`article_id`,`keyword_id`) and, since article_id is the leftmost column in this index, there is a good index which can be used for this join.

  • career_article_keyword.keyword_id in ck JOIN ca0 and ca0 JOIN ca1

    There is no index that can be used for this join: the only index defined in this table has another column, article_id to the left of keyword_id - so MySQL cannot find keyword_id entries in the index without first knowing the article_id. I suggest you create a new index which has keyword_id as its leftmost column.

    (The need for this index could equally have been ascertained directly from looking at your original query, where your two outermost queries perform joins on that column.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top