Pregunta

I am working with SQL Server 2008

If I have a Table as such:

Code   Value
-----------------------
4      240
4      299
4      210
2      NULL
2      3
6      30
6      80
6      10
4      240
2      30

How can I find the median AND group by the Code column please? To get a resultset like this:

Code   Median
-----------------------
4      240
2      16.5
6      30

I really like this solution for median, but unfortunately it doesn't include Group By: https://stackoverflow.com/a/2026609/106227

¿Fue útil?

Solución

The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.

1
2
3
4

The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.

WITH CTE AS
(   SELECT  Code,
            Value, 
            [half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value), 
            [half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
    FROM    T
    WHERE   Value IS NOT NULL
)
SELECT  Code,
        (MAX(CASE WHEN Half1 = 1 THEN Value END) + 
        MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM    CTE
GROUP BY Code;

Example on SQL Fiddle


In SQL Server 2012 you can use PERCENTILE_CONT

SELECT  DISTINCT
        Code,
        Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM    T;

Example on SQL Fiddle

Otros consejos

SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:

WITH RankedTable AS (
    SELECT Code, Value, 
        ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
        COUNT(*) OVER (PARTITION BY Code) AS Cnt
    FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1

To elaborate a bit on this solution, consider the output of the RankedTable CTE:

Code   Value   Rnk    Cnt
---------------------------
4      240     2      3   -- Median
4      299     3      3
4      210     1      3
2      NULL    1      2
2      3       2      2   -- Median
6      30      2      3   -- Median
6      80      3      3
6      10      1      3

Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top