Question

I have a panel data set: that is, times, ids, and values. I would like to do a ranking based on value for each date. I can achieve the sort very simply by running:

select * from tbl order by date, value

The issue I have is once the table is sorted in this way, how do I retrieve the row number of each group (that is, for each date I would like there to be a column called ranking that goes from 1 to N).

Example:

Input:

Date, ID, Value
 d1, id1, 2
 d1, id2, 1
 d2, id1, 10
 d2, id2, 11

Output:

Date, ID, Value, Rank
 d1, id2, 1, 1
 d1, id1, 2, 2
 d2, id1, 10, 1
 d2, id2, 11, 2
Was it helpful?

Solution

Absent window functions, you can order tbl and use user variables to compute rank over your partitions ("date" values) yourself:

SELECT "date",                                                -- D) Desired columns
       id,
       value,
       rank
  FROM (SELECT "date",                                        -- C) Rank by date
               id,
               value,
               CASE COALESCE(@partition, "date")
                 WHEN "date" THEN @rank := @rank + 1
                 ELSE             @rank := 1
               END AS rank,
               @partition := "date" AS dummy
          FROM (SELECT @rank := 0 AS rank,                    -- A) User var init
                       @partition := NULL AS partition) dummy
               STRAIGHT_JOIN
               (  SELECT "date",                              -- B) Ordering query
                         id,
                         value
                    FROM tbl
                ORDER BY date, value) tbl_ordered;

Update

So, what is that query doing?

We are using user variables to "loop" through a sorted result set, incrementing or resetting a counter (@rank) depending upon which contiguous segment of the result set (tracked in @partition) we're in.

In query A we initialize two user variables. In query B we get the records of your table in the order we need: first by date and then by value. A and B together make a derived table, tbl_ordered, that looks something like this:

rank | partition | "date" |  id  | value 
---- + --------- + ------ + ---- + -----
  0  |   NULL    |   d1   |  id2 |    1
  0  |   NULL    |   d1   |  id1 |    2
  0  |   NULL    |   d2   |  id1 |   10
  0  |   NULL    |   d2   |  id2 |   11

Remember, we don't really care about the columns dummy.rank and dummy.partition — they're just accidents of how we initialize the variables @rank and @partition.

In query C we loop through the derived table's records. What we're doing is more-or-less what the following pseudocode does:

rank      = 0
partition = nil

foreach row in fetch_rows(sorted_query):
  (date, id, value) = row

  if partition is nil or partition == date:
    rank += 1
  else:
    rank = 1

  partition = date

  stdout.write(date, id, value, rank, partition)

Finally, query D projects all columns from C except for the column holding @partition (which we named dummy and do not need to display).

OTHER TIPS

I know this is an old question but here is a shorter answer:

SELECT w.*, if(
      @preDate = w.date,
      @rank := @rank + 1,
      @rank := (@preDate :=w.date) = w.date
    ) rank
FROM tbl w
JOIN (SELECT @preDate := '' )a
ORDER BY date, value

Would this do the trick?

select [DATE],ID,Value, 
(DENSE_RANK()  OVER (   
   PARTITION BY ID
 ORDER BY Date) )AS [DenseRank],    
ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [Date] DESC) AS RN     
from SomeTable 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top