SQL Server: How to use dense_rank on one column, based on order by another column

https://stackoverflow.com/questions/11556125

21-06-2021
|

Question

I have got a table in SQL Server 2008 where I need alternating values for one column, say column alt. Duplicates in that column always need the same value, hence I was thinking about using the dense_rank function for this column alt via % 2.

But there are also zip codes in that table, that I need to order the data by before assigning the alternating values.

So basically after the alternating values based on column alt have been assigned when the data is then ordered by zip code the alternating values really need to be alternating (apart from the duplicates in the 'alt' table of course).

Currently I get a result where the alt values do get alternating values, but when ordering by zip codes, I have sequences of e.g. 0,0,0 via the dense_rank function that are the problem.

I tried using a temp table, but didn't get the expected result with a

select * into #txy ordered by zip

and then doing the desk_rank on that table because the order of a temp table isn't guaranteed.

Any ideas are greatly appreciated!

Cheers, Stevo

Edit:

Sample Code:

CREATE TABLE [xy.TestTable](
[BaseForAlternatingValue] [char](10),
[zip] [varchar](5)
) ON [PRIMARY]
GO


INSERT INTO [xy.TestTable]
       ([BaseForAlternatingValue]
       ,[zip])
 VALUES
       ('cccccccccc','99999'),
       ('bbbbbbbbbb','22222'),
       ('aaaaaaaaaa','12345'),
       ('dddddddddd','33333'),
       ('aaaaaaaaaa','12345'),
       ('bbbbbbbbbb','22222')
GO

select (DENSE_RANK() OVER (ORDER BY BaseForAlternatingValue)) % 2 as AlternatingValue
    , BaseForAlternatingValue
    , zip
    from [xy.TestTable]
    order by zip


Result:
AlternatingValue    BaseForAlternatingValue zip
1                      aaaaaaaaaa            12345
1                      aaaaaaaaaa            12345
0                      bbbbbbbbbb            22222
0                      bbbbbbbbbb            22222
0                      dddddddddd            33333
1                      cccccccccc            99999

The Problem now is that when ordered by zip code the following columns both contain the same value (0) as alternating value. When ordered by zip code the result should really have alternating values, but these alternating values should be based on the column BaseForAlternatingValue.

0                      bbbbbbbbbb            22222
0                      dddddddddd            33333

The expected outcome should be:

AlternatingValue    BaseForAlternatingValue zip
1                      aaaaaaaaaa            12345
1                      aaaaaaaaaa            12345
0                      bbbbbbbbbb            22222
0                      bbbbbbbbbb            22222
1                      dddddddddd            33333
0                      cccccccccc            99999

The last AlternatingValue of the last two result rows is different: the Alternating Value needs to alternate between different zip codes. Before it was 0 for the third last row and also 0 for the second last row.

As for Mikael's question below, "And what if you have add row ('cccccccccc','12345'). What would the expected output be then?"

The expected output would then be:

AlternatingValue    BaseForAlternatingValue zip
1                      aaaaaaaaaa            12345
1                      aaaaaaaaaa            12345
0                      cccccccccc            12345
1                      bbbbbbbbbb            22222
1                      bbbbbbbbbb            22222
0                      dddddddddd            33333
0                      cccccccccc            99999

So in summary: I need alternating values for the column BaseForAlternatingValue, but this alternating should be visible when ordering by zip code. (and duplicates in BaseForAlternatingValue need the same "alternating" value)

----------------

In the end I found a simpler and relatively nice solution: 1) using a temp table with an insert into and order by and using id values (id values will reflect the order by clause) 2) finding out the smallest id for a given BaseForAlternatingValue 3) finding out the count of distinct BaseForAlternatingValues with an id smaller than that

La solution

Try using ROW_NUMBER as a direct replacement for DENSE_RANK. DENSE_RANK will give multiple rows the same value where they tie for a rank - ROW_NUMBER will not.

DENSE_RANK reference ROW_NUMBER reference

EDIT

This is ugly but appears to produce the correct result. The first CTE determines the output order of the rows and calculates the "alternating value".
The second determines the first instance of each BaseForAlternatingValue in the output result set.
The output query returns the rows in the right order with the first "alternating value" for each BaseForAlternatingValue

;WITH cte
AS
(
SELECT BaseForAlternatingValue, zip, 
       ROW_NUMBER() OVER (ORDER BY zip,BaseForAlternatingValue)AS rn,
       DENSE_RANK() OVER (ORDER BY zip,BaseForAlternatingValue) % 2 AS av
FROM [xy.TestTable]
)
,rnCTE
AS
(
SELECT *, 
       ROW_NUMBER() OVER (PARTITION BY BaseForAlternatingValue ORDER BY rn) AS rn2
FROM cte
)
SELECT rn.av AS AlternatingValue, 
       r.BaseForAlternatingValue, r.zip
FROM cte r
JOIN rnCTE rn
ON rn.BaseForAlternatingValue = r.BaseForAlternatingValue
AND rn.rn2 =1
ORDER BY zip, BaseForAlternatingValue

Autres conseils

I know this is irrelevant now, as this question has long-since been solved.

You can do this with a single cte and a join:

with mins as (
    select min(zip) min_zip,
        BaseForAlternatingValue
    from xy.TestTable
    group by BaseForAlternatingValue
)
select dense_rank() over (order by m.min_zip, t.BaseForAlternatingValue) % 2 AlternatingValue,
    t.BaseForAlternatingValue,
    t.zip
from xy.TestTable t
join mins m on m.BaseForAlternatingValue = t.BaseForAlternatingValue
order by t.zip, t.base;

Alternate solution for SQL Server 2012 with a single cte:

with mins as (
    select min(zip) over (partition by BaseForAlternatingValue) min_zip,
        BaseForAlternatingValue,
        zip
    from xy.TestTable
)
select dense_rank() over (order by min_zip, BaseForAlternatingValue) % 2 AlternatingValue,
    BaseForAlternatingValue,
    zip
from mins
order by zip;

The idea is that if you can guarantee that there are never 2 of the same base with different zips, you can dense_rank ordered by zip first and then base. Since your ordering only depends on the minimum zip for each base, you can get that using min() - or in 2012 min() over (partition by) to remove the join.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow