Pergunta

I need to optimize the following query:

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM blogposts
JOIN articles ON articles.blogpost_id = blogposts.id
WHERE blogposts.deleted = 0
AND blogposts.title LIKE '%{de}%'
AND blogposts.visible = 1
AND blogposts.date_published <= NOW()
ORDER BY blogposts.date_created DESC
LIMIT 0 , 50

EXPLAIN SELECT gives me the following result:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE articles ALL blogpost_id NULL NULL NULL 6915 Using temporary; Using filesort
1 SIMPLE blogposts eq_ref PRIMARY PRIMARY 4 articles.blogpost_id 1 Using where

Why does it first take the articles and then the blogposts? Is it because blogposts have more entries? And how can I improve the query so that the articlepost can use an index?

Update: An index is set on blogposts.date_created. Removing the blogposts.title LIKE condition and the date_published <= NOW() doesn't do anything.

When I remove the "articles.id AS articleid" it can use the blogpost_id Index on articles... Sounds strange to me, someone knows why? (because I actually need it..)

The new explain looks like this:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  articles    index blogpost_id blogpost_id    4    NULL    6915    Using index; Using temporary; Using filesort
1   SIMPLE  blogposts   eq_ref  PRIMARY PRIMARY 4   articles.blogpost_id    1   Using where
Foi útil?

Solução

I took a closer look at the query and you might be able to redesign it. Here is what I mean:

The LIMIT 0,50 portion of the query seems to be made busy in the query last.

You can improve the layout of the query by doing the following:

Step 1) Create an inline query to gather only keys. In this case, the id for blogposts.

Step 2) Impose any WHERE, ORDER BY and GROUP BY clauses on the inline query bringing keys.

Step 3) Impost the LIMIT clause as the last step of making the inline query.

Step 4) Join the inline query with the blogpost table in the event you need additional columns from blogpost as a blostpost temptable

Step 5) Join this new blogpost temptable with the articles table.

Steps 1-3 is meant to create a temptable with exactly 50 rows and include the blogpost id. Then, perform all JOINs dead last.

With these steps applied to your original query, you should have this:

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM
(
  SELECT B.*
  FROM
  (
    SELECT id FROM blogposts
    WHERE date_published <= NOW()
    AND deleted = 0 AND visible = 1
    AND title LIKE '%{de}%'
    ORDER BY date_created DESC
    LIMIT 0,50
  ) A
  INNER JOIN blogposts B USING (id)
) blogposts
INNER JOIN articles
ON blogposts.id = articles.blogpost_id;

Since you edited the question and stated that you will remove LIKE, now your query should look more like this:

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM
(
  SELECT B.*
  FROM
  (
    SELECT id FROM blogposts
    WHERE date_published <= NOW()
    AND deleted = 0 AND visible = 1
    ORDER BY date_created DESC
    LIMIT 0,50
  ) A
  INNER JOIN blogposts B USING (id)
) blogposts
INNER JOIN articles
ON blogposts.id = articles.blogpost_id;

In the phrase [things omitted], if you do not need anything from blogposts other than the keys, then your query should look like this:

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM
(
  SELECT id FROM blogposts
  WHERE date_published <= NOW()
  AND deleted = 0 AND visible = 1
  ORDER BY date_created DESC
  LIMIT 0,50
) blogposts
INNER JOIN articles
ON blogposts.id = articles.blogpost_id;

CAVEAT

Make sure you build an index that involves the columns deleted, visible, and date_created as follows:

ALTER TABLE blogposts ADD INDEX deleted_visible_date_created (deleted,visible,date_created);

Give it a Try !!!

Outras dicas

Your where condition blogposts.title LIKE '%{de}%' will cause a full table scan on the blogposts table. It's likely MySQL figures scanning 6915 articles is more efficient.

As to how to improve it, you might add an index on blogposts using the date_created or date_published and add a range the to the where condition (something other than <=NOW())

AND blogposts.title LIKE '%{de}%' -- Useless for optimization (leading wild card)

WHERE blogposts.deleted = 0 AND blogposts.visible = 1 -- Yuck: If they should not be shown, get them out of the table.

AND blogposts.date_published <= NOW() ORDER BY blogposts.date_created DESC LIMIT 0 , 50 -- That leads to needing INDEX(date_published) (However, the LIKE and flags may prevent this index from being used.)

Please provide SHOW CREATE TABLE ...; SHOW TABLE STATUS ...;

Thank you very much :)

I gave it a try and found out, that actually your last comment about the index was the thing I needed. Sometimes it's just some small improvement which is needed... ;) Comparison:

Old Query:

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM blogposts
JOIN articles ON articles.blogpost_id = blogposts.id
WHERE blogposts.deleted = 0
AND blogposts.title LIKE '%{de}%'
AND blogposts.visible = 1
AND blogposts.date_published <= NOW()
ORDER BY blogposts.date_created DESC
LIMIT 0 , 50

New query:

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM
(
    SELECT B.*
    FROM
    (
        SELECT id FROM blogposts
        WHERE date_published <= NOW()
        AND deleted = 0 AND visible = 1
        AND title LIKE '%{de}%'
        ORDER BY date_created DESC
        LIMIT 0,50
    ) A
    INNER JOIN blogposts B USING (id)
) blogposts
INNER JOIN articles
ON blogposts.id = articles.blogpost_id

Now without the index mentioned:

Old Explain Result:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE articles ALL blogpost_id NULL NULL NULL 6915 Using temporary; Using filesort
1 SIMPLE blogposts eq_ref PRIMARY PRIMARY 4 articles.blogpost_id 1 Using where

New Explain Result:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    50   
1   PRIMARY     articles    ref     blogposts_id    blogposts_id    4   blogposts.id    1    
2   DERIVED     <derived3>  ALL     NULL    NULL    NULL    NULL    50   
2   DERIVED     B   eq_ref  PRIMARY     PRIMARY     4   A.id    1    
3   DERIVED     blogposts   ALL     deleted,visible,date_published  deleted     1       28198   Using filesort

Explain result with index on old query:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  blogposts   ref     PRIMARY,deleted,visible,date_published,deleted_vis...   deleted_visible_date_created    2   const,const     27771   Using where
1   SIMPLE  articles    ref     blogposts_id    blogposts_id    4   db.blogposts.id     1

Explain result with index on new query:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    50   
1   PRIMARY     articles    ref     blogposts_id    blogposts_id    4   blogposts.id    1    
2   DERIVED     <derived3>  ALL     NULL    NULL    NULL    NULL    50   
2   DERIVED     B   eq_ref  PRIMARY     PRIMARY     4   A.id    1    
3   DERIVED     blogposts   ref     deleted,visible,date_published,deleted_visible_dat...   deleted_visible_date_created    2       27771   Using where

Speed on old query without/with index: 0.1835/0.0037

Speed on new query without/with index: 0.1883/0.0035

So because of the just marginal differency between the old/new query I prefer to still use the old query but with the index. But I will keep this in mind, as if someday the old query is too slow :)

What would be interesting for me is to know, how you got the idea to set the index like this? I think as I published this question, I tried also with deleted_visible but instead of date_created I used date_published (as it's in the where-clause)...

Thanks :)

UPDATE by RolandoMySQLDBA 2011-05-19 13:03

What gave it away to me was the WHERE and ORDER BY clauses.

In the WHERE clause, deleted (0) and visible (1) are static values. In the ORDER BY clause, the date_created was like a moving target among all the rows with deleted=0 and visible=1. So, I placed the static variables in front of the index first, then the moving target last in the index. Usually, that is part of the underlying principle of refactoring SQL statements. You need indexes that will support your WHERE, GROUP BY, and ORDER clauses.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a dba.stackexchange
scroll top