Question

I have a table which counts occurrences of one specific action by different users on different objects:

CREATE TABLE `Actions` (
    `object_id` int(10) unsigned NOT NULL,
    `user_id` int(10) unsigned NOT NULL,
    `actionTime` datetime
);

Every time a user performs this action, a row is inserted. I can count how many actions were performed on each object, and order objects by 'activity':

SELECT object_id, count(object_id) AS action_count 
FROM `Actions` 
GROUP BY object_id 
ORDER BY action_count;

How can I limit the results to the top n objects? The LIMIT clause is applied before the aggregation, so it leads to wrong results. The table is potentially huge (millions of rows) and I probably need to count tens of times per minute, so I'd like to do this as efficient as possible.

edit: Actually, Machine is right, and I was wrong with the time at which LIMIT is applied. My query returned the correct results, but the GUI presenting them to me threw me off...this kind of makes this question pointless. Sorry!

Was it helpful?

Solution

Actually... LIMIT is applied last, after a eventual HAVING clause. So it should not give you incorrect results. However, since LIMIT is applied last, it will not provide any faster execution of your query, since a temporary table will have to be created and sorted in order of action count before chopping off the result. Also, remember to sort in descending order:

SELECT object_id, count(object_id) AS action_count 
FROM `Actions` 
GROUP BY object_id 
ORDER BY action_count DESC
LIMIT 10;

You could try adding an index to object_id for optimization. In that way only the index will need to be scanned instead of the Actions table.

OTHER TIPS

How about:

SELECT * FROM
(
SELECT object_id, count(object_id) AS action_count 
FROM `Actions` 
GROUP BY object_id 
ORDER BY action_count
)
LIMIT 15

Also, if you have some measure of what must be the minimum number of actions to be included (e.g. the top n ones are surely more than 1000), you can increase the efficiency by adding a HAVING clause:

SELECT * FROM
(
SELECT object_id, count(object_id) AS action_count 
FROM `Actions` 
GROUP BY object_id 
HAVING action_count > 1000
ORDER BY action_count
)
LIMIT 15

I know this thread is 2 years old but stackflow still finds it relevant so here goes my $0.02. ORDER BY clauses are computationally very expensive so they should be avoided in large tables. A trick I used (in part from Joe Celko's SQL for Smarties) is something like:

SELECT COUNT(*) AS counter, t0.object_id FROM (SELECT COUNT(*), actions.object_id FROM actions GROUP BY id) AS t0, (SELECT COUNT(*), actions.object_id FROM actions GROUP BY id) AS t1 WHERE t0.object_id < t1.object_id  GROUP BY object_id HAVING counter < 15

Will give you the top 15 edited objects without sorting. Note that as of v5, mysql will only cache result sets for exactly duplicate (whitespace incl) queries so the nested query will not get cached. Using a view would resolve that problem.

Yes, it's three queries instead of two and and the only gain is the not having to sort the grouped query but if you have a lot of groups, it will be faster.

Side note: the query is really handy for median functions w/o sorts

SELECT * FROM (SELECT object_id, count(object_id) AS action_count 
        FROM `Actions` 
        GROUP BY object_id 
        ORDER BY action_count) LIMIT 10;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top