SQL Server: How to use dense_rank on one column, based on order by another column
-
21-06-2021 - |
Question
I have got a table in SQL Server 2008 where I need alternating values for one column, say column alt
. Duplicates in that column always need the same value, hence I was thinking about using the dense_rank
function for this column alt
via % 2.
But there are also zip codes in that table, that I need to order the data by before assigning the alternating values.
So basically after the alternating values based on column alt
have been assigned when the data is then ordered by zip code the alternating values really need to be alternating (apart from the duplicates in the 'alt' table of course).
Currently I get a result where the alt
values do get alternating values, but when ordering by zip codes, I have sequences of e.g. 0,0,0 via the dense_rank function that are the problem.
I tried using a temp table, but didn't get the expected result with a
select * into #txy ordered by zip
and then doing the desk_rank on that table because the order of a temp table isn't guaranteed.
Any ideas are greatly appreciated!
Cheers, Stevo
Edit:
Sample Code:
CREATE TABLE [xy.TestTable](
[BaseForAlternatingValue] [char](10),
[zip] [varchar](5)
) ON [PRIMARY]
GO
INSERT INTO [xy.TestTable]
([BaseForAlternatingValue]
,[zip])
VALUES
('cccccccccc','99999'),
('bbbbbbbbbb','22222'),
('aaaaaaaaaa','12345'),
('dddddddddd','33333'),
('aaaaaaaaaa','12345'),
('bbbbbbbbbb','22222')
GO
select (DENSE_RANK() OVER (ORDER BY BaseForAlternatingValue)) % 2 as AlternatingValue
, BaseForAlternatingValue
, zip
from [xy.TestTable]
order by zip
Result:
AlternatingValue BaseForAlternatingValue zip
1 aaaaaaaaaa 12345
1 aaaaaaaaaa 12345
0 bbbbbbbbbb 22222
0 bbbbbbbbbb 22222
0 dddddddddd 33333
1 cccccccccc 99999
The Problem now is that when ordered by zip code the following columns both contain the same value (0) as alternating value. When ordered by zip code the result should really have alternating values, but these alternating values should be based on the column BaseForAlternatingValue.
0 bbbbbbbbbb 22222
0 dddddddddd 33333
The expected outcome should be:
AlternatingValue BaseForAlternatingValue zip
1 aaaaaaaaaa 12345
1 aaaaaaaaaa 12345
0 bbbbbbbbbb 22222
0 bbbbbbbbbb 22222
1 dddddddddd 33333
0 cccccccccc 99999
The last AlternatingValue of the last two result rows is different: the Alternating Value needs to alternate between different zip codes. Before it was 0 for the third last row and also 0 for the second last row.
As for Mikael's question below, "And what if you have add row ('cccccccccc','12345'). What would the expected output be then?"
The expected output would then be:
AlternatingValue BaseForAlternatingValue zip
1 aaaaaaaaaa 12345
1 aaaaaaaaaa 12345
0 cccccccccc 12345
1 bbbbbbbbbb 22222
1 bbbbbbbbbb 22222
0 dddddddddd 33333
0 cccccccccc 99999
So in summary: I need alternating values for the column BaseForAlternatingValue, but this alternating should be visible when ordering by zip code. (and duplicates in BaseForAlternatingValue need the same "alternating" value)
----------------
In the end I found a simpler and relatively nice solution: 1) using a temp table with an insert into and order by and using id values (id values will reflect the order by clause) 2) finding out the smallest id for a given BaseForAlternatingValue 3) finding out the count of distinct BaseForAlternatingValues with an id smaller than that
La solution
Try using ROW_NUMBER
as a direct replacement for DENSE_RANK
. DENSE_RANK
will give multiple rows the same value where they tie for a rank - ROW_NUMBER
will not.
DENSE_RANK
reference
ROW_NUMBER
reference
EDIT
This is ugly but appears to produce the correct result.
The first CTE determines the output order of the rows and calculates the "alternating value".
The second determines the first instance of each BaseForAlternatingValue
in the output result set.
The output query returns the rows in the right order with the first "alternating value" for each BaseForAlternatingValue
;WITH cte
AS
(
SELECT BaseForAlternatingValue, zip,
ROW_NUMBER() OVER (ORDER BY zip,BaseForAlternatingValue)AS rn,
DENSE_RANK() OVER (ORDER BY zip,BaseForAlternatingValue) % 2 AS av
FROM [xy.TestTable]
)
,rnCTE
AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY BaseForAlternatingValue ORDER BY rn) AS rn2
FROM cte
)
SELECT rn.av AS AlternatingValue,
r.BaseForAlternatingValue, r.zip
FROM cte r
JOIN rnCTE rn
ON rn.BaseForAlternatingValue = r.BaseForAlternatingValue
AND rn.rn2 =1
ORDER BY zip, BaseForAlternatingValue
Autres conseils
I know this is irrelevant now, as this question has long-since been solved.
You can do this with a single cte and a join:
with mins as (
select min(zip) min_zip,
BaseForAlternatingValue
from xy.TestTable
group by BaseForAlternatingValue
)
select dense_rank() over (order by m.min_zip, t.BaseForAlternatingValue) % 2 AlternatingValue,
t.BaseForAlternatingValue,
t.zip
from xy.TestTable t
join mins m on m.BaseForAlternatingValue = t.BaseForAlternatingValue
order by t.zip, t.base;
Alternate solution for SQL Server 2012 with a single cte:
with mins as (
select min(zip) over (partition by BaseForAlternatingValue) min_zip,
BaseForAlternatingValue,
zip
from xy.TestTable
)
select dense_rank() over (order by min_zip, BaseForAlternatingValue) % 2 AlternatingValue,
BaseForAlternatingValue,
zip
from mins
order by zip;
The idea is that if you can guarantee that there are never 2 of the same base with different zips, you can dense_rank ordered by zip first and then base. Since your ordering only depends on the minimum zip for each base, you can get that using min()
- or in 2012 min() over (partition by)
to remove the join
.