Question

I have some forum data of the form

post(author, thread_id, text)

For each author, I would like to select 10 distinct thread_ids associated with that author (there may be more than 10, and the number will vary by author).

I'm thinking of using GROUP BY to group on 'author', but I cannot understand how to express the LIMIT on each group, and how to expand each group back into 10 rows.

Was it helpful?

Solution

Here's a solution to "top N per group" type queries.

Note that you have to choose which 10 threads for a given author you want. For this example, I'm assuming you want the most recent threads (and thread_id is an auto-increment value), and for cases of ties, you have a primary key posts.post_id.

SELECT p1.*
FROM post p1 LEFT OUTER JOIN post p2
 ON (p1.author = p2.author AND (p1.thread_id < p2.thread_id 
   OR p1.thread_id = p2.thread_id AND p1.post_id < p2.post_id))
GROUP BY p1.author
HAVING COUNT(*) < 10;

Re your follow-up question in the comment, here's the explanation:

In the top 10 threads per author, we can say that for each of these, there are 9 or fewer other threads for that author belonging to the result set. So for each author's post (p1), we count how many posts (p2) from the same author have a greater thread. If that count is less than 10, then that author's post (p1) belongs in the result.

I added a term to resolve ties with the post_id.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top