문제

I am fairly new to SQL, working on some practice problems. I have a sample Twitter database and I am trying to find the top 3 users in each location based on number of followers.

Here are the tables I am using:

id_follower_location

        id       | followers | location 
-----------------+-----------+----------
 id28929238      |         1 | Toronto
 id289292338     |         1 | California
 id2892923838    |         2 | Rome
 .
 .

locations

           location       
----------------------
 Bay Area, California
 London
 Nashville, TN
.
.

I've been able to find the "top" user by:

create view top1id as 
  select location, 
    (select id_followers_location.id from id_followers_location 
      where id_followers_location.location = locations.location 
      order by followers desc limit 1
    ) as id 
  from locations;

create view top1 as 
  select location, id, 
    (select followers from id_followers_location 
      where id_followers_location.id = top1id.id
    ) as followers 
  from top1id;

The only way I have been able to come up with so solve this is by finding out the "Top 1st", the "Top 2nd", the "Top 3rd" and then using union to combine it. Is that the correct/only way of doing this? Or is there a better way?

도움이 되었습니까?

해결책

Top n

With rank() you get at least 3 rows (fewer if fewer exist). If there are ties among the top 3 ranks, more rows may be returned. See:

If you want exactly 3 rows per location (fewer if fewer exist), you have to break ties. One way is to use row_number() instead of rank().

SELECT *
FROM (
   SELECT id, location
        , row_number() OVER (PARTITION BY location ORDER BY followers DESC) AS rn
   FROM   id_follower_location
   ) r
WHERE  rn <= 3
ORDER  BY location, rn;

You may want to add ORDER BY to the outer query to guarantee sorted output.
If there are more than three valid candidates, you get an arbitrary pick from ties - unless you add more ORDER BY items in the OVER clause to break ties.

Top 1

As for your query to get the top 1 row: there is a much simpler and faster way in PostgreSQL:

SELECT DISTINCT ON (location)
       id, location           -- add additional columns freely
FROM   id_follower_location
ORDER  BY location, followers DESC;

Details for this query technique in this closely related answer:

다른 팁

You can do this with windowing functions: http://www.postgresql.org/docs/9.1/static/tutorial-window.html

For example (not tested may need slight syntax fixes):

SELECT follower_ranks.id, follower_ranks.location 
FROM (
    SELECT id, location, 
      RANK() OVER (PARTITION BY location ORDER BY followers DESC) 
    FROM id_follower_location
) follower_ranks 
WHERE follower_ranks.rank <= 3;
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top