Question

I'm not sure how to go about doing this efficiently in MySQL and would appreciate any help.

The goal is to select 50 of the top-selling items, with at most one item from each user. I'm used to doing this with either CTE's or DISTINCT ON, but of course that's not an option in MySQL. I'm hoping for a single-query solution, and I'd like to avoid using stored procedures.

The basic schema is a table of items posted by users, and a table of sales with a field determining the score of that particular sale.

CREATE TABLE items (
item_id INT PRIMARY KEY,
user_id INT NOT NULL
)
CREATE TABLE sales (
item_id INT NOT NULL,
score INT NOT NULL
)
-- Create some sample data
INSERT INTO items VALUES (1, 1), (2, 1), (3, 1), (4, 2), (5, 2), (6, 3), (7, 3);
INSERT INTO sales VALUES (1, 1), (1, 1), (2, 1), (3, 2), (3, 1), (4, 3), (4, 2), (5, 2), (6, 1), (6, 1), (6, 1), (7, 2);

The result of the query against this sample data should be

+---------+---------+-------------+
| user_id | item_id | total_score |
+---------+---------+-------------+
|       2 |       4 |           5 |
|       1 |       3 |           3 |
|       3 |       6 |           3 |
+---------+---------+-------------+

Here's the PostgreSQL solution:

SELECT DISTIN ON (items.user_id)
    items.user_id,
    items.item_id,
    SUM(sales.score) AS total_score
FROM items
JOIN sales ON (sales.item_id = items.item_id)
GROUP BY items.item_id
ORDER BY total_score DESC
LIMIT 50

Here's the MySQL solution I've come up with, but it's quite ugly. I tried doing essentially the same thing using a temporary table, but in the process learned that MySQL doesn't allow joining to a temporary table multiple times in the same query.

SELECT items_scores.user_id, items_scores.item_id, items_scores.total_score
FROM (
    SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
    FROM items
    JOIN sales ON
        sales.item_id = items.item_id
    GROUP BY items.item_id
    ) AS items_scores
WHERE items_scores.total_score =
    (
    SELECT MAX(t.total_score)
    FROM (
        SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
        FROM items
        JOIN sales ON
            sales.item_id = items.item_id
        GROUP BY items.item_id
        ) AS t
    WHERE t.user_id = items_scores.user_id
    )
ORDER BY items_scores.total_score DESC

No correct solution

OTHER TIPS

MySQL query for it:

select user, item, total_score 
from (
     select sum(sales.score) as total_score, items.user_id as user, items.item_id as item
     from sales 
     inner join items on sales.item_id = items.item_id 
     group by item,user 
     order by total_score desc) as t
 group by user limit 50;

Output:

+------+------+-------------+
| user | item | total_score |
+------+------+-------------+
|    1 |    3 |           3 |
|    2 |    4 |           5 |
|    3 |    6 |           3 |
+------+------+-------------+
3 rows in set (0.00 sec)

Some explanation

MySQL documentation says:

However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

In our subquery... the nonagregated columns are user_id and item_id , we expect them to be same for every group that we are doing the sum on. Also we are not doing any order by that can influence the agregation..we want all the values of the group to be summed up. Finally we are sorting the output and saving it as a derived table.

Finally we run a select query on this derived table where we do the Group By user .. and Limit the output to 50

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top