Pergunta

i'm trying to get optimize an very old query that i can't wrap my head around. the result that i want to archive is that i want to recommend the visitor on a web shop what other customers have shown interest in, i.e. what else they have bought together with the product that the visitor is looking at.

i have a subquery but it's very slow, takes ~15s on ~8 000 000 rows.

the layout is that all products that are put in a users basket are kept in a table wsBasket and separated by a basketid (which in another table is associated with a member).

in this example i want to list all the most popular products that users have bought together with productid 427, but not list the productid 427 itself.

SELECT productid, SUM(quantity) AS qty 
FROM wsBasket 
WHERE basketid IN 
    (SELECT basketid 
     FROM wsBasket 
     WHERE productid=427) AND productid!=427 
GROUP by productid 
ORDER BY qty 
DESC LIMIT 0,4;

any help is much appreciated! hope this makes any sense at all to at least someone :)

 

UPDATE 1: thanks for your comments guys here are my answers, they didn't fit in the comments-field.

Using EXPLAIN on the above query i got the fllowing. Please note, I do not have any indexes on the table (except for primary key on the id-field), i want to modify the query to benefit from indexes and place indexes on the right keys.

+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type        | table    | type | possible_keys | key  | key_len | ref  | rows | Extra                                        |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
|  1 | PRIMARY            | wsBasket | ALL  | NULL          | NULL | NULL    | NULL | 2821 | Using where; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | wsBasket | ALL  | NULL          | NULL | NULL    | NULL | 2821 | Using where                                  |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
Foi útil?

Solução

Two obvious indexes to add: one on basketid and a second on productid: then retry the query and a new EXPLAIN to see that the indexes are being used

Outras dicas

As well as ensuring that suitable indexes exist on productid and basketid, you will often benefit from structuring your query as a simple join rather than a subquery, especially in MySQL.

SELECT b1.productid, SUM(b1.quantity) AS qty
FROM wsBasket AS b0
JOIN wsBasket AS b1 ON b1.basketid=b0.basketid
WHERE b0.productid=427 AND b1.productid<>427
GROUP BY b1.productid
ORDER BY qty DESC
LIMIT 4

For me, on a possibly-similar dataset, the join resulted in two select_type: SIMPLE rows in the EXPLAIN output, whereas the subquery method spat out a horrible-for-performance DEPENDENT SUBQUERY. Consequently the join was well over an order of magnitude faster.

The two fields you mainly use for searching in this query are productid and basketid.

When you search for records having productid equal to 427, Database has no clue where to find this record. It doesn't even know that if it does find one matching, that there will not be another matching one, so it has to look through the entire table, potentially thousands of records.

An index is a separate file that is sorted, and contains only the field/s you're interested in sorting on. so creating an index saves a immense amount of time!

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top