Question

I'm playing around with the Lahman Baseball Database in a MySQL instance. I want to find the players who topped home runs (HR) for each year. The Batting table has the following (relevant parts) of its schema:

+-----------+----------------------+------+-----+---------+-------+
| Field     | Type                 | Null | Key | Default | Extra |
+-----------+----------------------+------+-----+---------+-------+
| playerID  | varchar(9)           | NO   | PRI |         |       |
| yearID    | smallint(4) unsigned | NO   | PRI | 0       |       |
| HR        | smallint(3) unsigned | YES  |     | NULL    |       |
+-----------+----------------------+------+-----+---------+-------+

For each year, every player has an entry (between hundreds and 12k per year, going back to 1871). Getting the top N hitters for a single year is easy:

SELECT playerID,yearID,HR
FROM Batting
WHERE yearID=2009
ORDER BY HR DESC LIMIT 3;
+-----------+--------+------+
| playerID  | yearID | HR   |
+-----------+--------+------+
| pujolal01 |   2009 |   47 |
| fieldpr01 |   2009 |   46 |
| howarry01 |   2009 |   45 |
+-----------+--------+------+

But I'm interested in finding the top 3 from every year. I've found solutions like this, describing how to select the top from a category and I've tried to apply it to my problem, only to end up with a query that never returns:

SELECT
    b.yearID, b.playerID, b.HR
FROM
    Batting AS b
LEFT JOIN
    Batting b2
    ON
    (b.yearID=b2.yearID AND b.HR <= b2.HR)
GROUP BY b.yearID HAVING COUNT(*) <= 3;

Where have I gone wrong?

Was it helpful?

Solution

Something like this should work:

SELECT b.playerID, b.yearID, b.HR
FROM Batting b 
WHERE HR >= (
    SELECT b2.HR 
    FROM Batting b2 
    WHERE b2.yearID=b1.yearID
    ORDER BY b2.HR DESC
    LIMIT 2, 1
)
ORDER BY b.yearID DESC, b.HR DESC;

Explanation: select all rows which have >= number of home runs as the third highest for that year. This won't break ties. So if there's more than one batter with the same number of home runs, they'll all show up.

The results are the ordered from the most recent year, sub-ordered by rank for each year.

Note: LIMIT is a 0-based offset, so 2, 1 means starting after the second row grab one row, i.e.: the third row.

OTHER TIPS

Wow, random. I happened to be doing the same query (for salaries though) on the Lahman Baseball DB using an article on emulating Oracle Analytic functions. This version of the query is snappy, but not so intuitive.

select *
from (

select 
    b.yearID as year,
    b.teamID as team,
    m.nameFirst as first,
    m.nameLast as last,
    find_in_set(b.HR, x.teamRank) as rank,
    b.HR as HR


from 
    Batting b
    inner join Master m on m.playerID = b.playerID
    inner join (select yearID, group_concat(distinct HR order by HR desc) as teamRank from Batting group by yearID) x on x.yearID = b.yearID

) x

where 
    rank <= 10 and rank > 0 

order by    
    year desc, rank

Or the top 5 HR totals per team for the year 2010...

select *
from (

select 
    b.yearID as year,
    b.teamID as team,
    m.nameFirst as first, 
    m.nameLast as last, 
    b.HR as HR,
    find_in_set(b.HR, x.teamRank) as rank

from 
    Batting b
    inner join Master m on m.playerID = b.playerID
    inner join (select teamID, group_concat(distinct HR order by HR desc) as teamRank from Batting where yearID = 2010 group by teamID) x on x.teamID = b.teamID
where 
    b.yearID = 2010
) x

where 
    rank <= 5 and rank > 0 

order by    
    team, rank

limit 12

Shows these results...

year    team    first   last        HR  rank
2010    ARI Mark    Reynolds    32  1
2010    ARI Chris   Young       27  2
2010    ARI Kelly   Johnson     26  3
2010    ARI Adam    LaRoche     25  4
2010    ARI Justin  Upton       17  5
2010    ATL Brian   McCann      21  1
2010    ATL Jason   Heyward     18  2
2010    ATL Troy    Glaus       16  3
2010    ATL Martin  Prado       15  4
2010    ATL Eric    Hinske      11  5
2010    BAL Luke    Scott       27  1
2010    BAL Ty      Wigginton   22  2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top