Pregunta

My query is currently taking roughly 3 seconds, which I'm sure can be optimized. I just can't figure out how to optimize it.

My app has a reasonably big products table (roughly 500,000 records). Each product can be listed on one of 50 domains (listed in a domains table). The links between products and domains are stored in the domains_products table (which has approximately 1,400,000 records). The slow query is in my app's admin section, where I need to be able to see products that are NOT listed on any domain.

Stripped to the bare bones with all unrelated joins removed, the query in question is:

SELECT    `products`.*
FROM      `products`
LEFT JOIN `domains_products`
ON        `domains_products`.`product_id` = `products`.`id`
WHERE     `products`.`deleted` = 'N'
AND       `domains_products`.`domain_id` IS NULL
ORDER BY  `products`.`id` ASC

In this form, the query takes more than 3 seconds and returns a little over 3,000 products (which is correct). If I remove either WHERE clause, the query takes 0.12 seconds (but obviously does not return the correct results).

Both tables use the InnoDB engine. The products table has a primary key on the id column and an index on the deleted column. The domains_products table only has a product_id and domain_id column, the primary key is on both these columns and they both have their own index. All relevant columns are NOT NULL columns.

EXPLAIN gives me this:

id select_type table            type possible_keys key        key_len ref         rows   Extra
1  SIMPLE      products         ref  deleted       deleted    1       const       188616 Using where
1  SIMPLE      domains_products ref  product_id    product_id 4       products.id 1      Using where; Using index; Not exists

Note that although MySQL has discovered the correct keys, it doesn't actually seem to be using them.

The profiler says this:

Status               Time
Starting             62 µs
Checking Permissions 7 µs
Checking Permissions 5 µs
Opening Tables       38 µs
System Lock          13 µs
Init                 37 µs
Optimizing           17 µs
Statistics           1,3 ms
Preparing            25 µs
Executing            5 µs
Sorting Result       5 µs
Sending Data         3,3 s
End                  28 µs
Query End            8 µs
Closing Tables       25 µs
Freeing Items        297 µs
Logging Slow Query   4 µs
Cleaning Up          5 µs

Note that it seems to be hanging on Sending Data. I've tried replacing the join by a NOT IN:

SELECT `products`.*
FROM   `products`
WHERE  `products`.`deleted` = 'N'
AND    `product`.`id` NOT IN (
    SELECT `product_id`
    FROM   `domains_products`
)
ORDER BY `products`.`id` ASC

This query gives the exact same results, but takes 3.8 seconds.

Can anyone point me in the right direction to optimize this query?

¿Fue útil?

Solución

It seems that the problem is with the "deleted" column. I'm guessing that almost all of the items in the products table is marked with "N", making the index on the "deleted" column pretty useless in this case.

One thing you can do is create another table, say deleted_domains_products that would store the product_id (and the domain_id if you want). Then you create a trigger so every time an entry was deleted from domains_products, it would insert an entry into that table. Then you'll have a smaller set to query against. And when you're done, you can truncate that table for the next time, so it should always be pretty quick.

Otros consejos

Try to create the following indexes and then rerun the query:

  1. domains_products (product_id, domain_id)
  2. products (id, deleted)

Tell us how it goes this

Try this, and let me know the time it is taking.

SELECT `products`.*
FROM   `products`
WHERE  `products`.`deleted` = 'N'
AND    NOT EXISTS (SELECT 1 
               FROM `domains_products` 
               WHERE `domains_products`.`product_id` = `products`.`id`
              );
ORDER BY `products`.`id` ASC
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top