SQL Server: Only last entry in GROUP BY

https://stackoverflow.com/questions/438938

22-07-2019
|

Question

I have the following table in MSSQL2005

id | business_key | result
1 | 1 | 0
2 | 1 | 1
3 | 2 | 1
4 | 3 | 1
5 | 4 | 1
6 | 4 | 0

And now i want to group based on the business_key returning the complete entry with the highest id. So my expected result is:

business_key | result
1 | 1
2 | 1
3 | 1
4 | 0

I bet that there is a way to achieve that, i just can't see it at the moment.

Solution

An alternative solution, which may give you better performance (test both ways and check the execution plans):

SELECT
     T1.id,
     T1.business_key,
     T1.result
FROM
     dbo.My_Table T1
LEFT OUTER JOIN dbo.My_Table T2 ON
     T2.business_key = T1.business_key AND
     T2.id > T1.id
WHERE
     T2.id IS NULL

This query assumes that the ID is a unique value (at least for any given business_key) and that it is set to NOT NULL.

OTHER TIPS

select
  drv.business_key,
  mytable.result
from mytable
  inner join
  (
    select 
      business_key, 
      max(id) as max_id
    from mytable
    group by
      business_key
  ) as drv on
    mytable.id = drv.max_id

Try this

select  business_key, 
        result
from    myTable
where   id in 
        (select max(id)
        from    myTable
        group by business_key)

EDIT: I created the table to test my code. I include it below in case anybody else wants to test it.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[myTable](
    [id] [int] NOT NULL,
    [business_key] [int] NOT NULL,
    [result] [int] NOT NULL
) ON [PRIMARY]
go

insert into myTable values(1,1,0);
insert into myTable values(2,1,1);
insert into myTable values(3,2,1);
insert into myTable values(4,3,1);
insert into myTable values(5,4,1);
insert into myTable values(6,4,0);

select  * from mytable

select business_key, 
       result
    from 
    (select id, 
        business_key, 
        result, 
        max(id) over (partition by business_key) as max_id
    from mytable) x
where id = max_id

This is an older post but was relevant to something I was doing currently (2013). If you get a larger dataset (typical in most DBs), the performance of the various queries (looking at execution plans) says a lot. First we create a "TALLY table" to randomly generate numbers, then use an arbitrary formula to create data for the "MyTable":

CREATE TABLE #myTable(
    [id] [int] NOT NULL,
    [business_key] [int] NOT NULL,
    [result] [int] NOT NULL,
    PRIMARY KEY (Id)
) ON [PRIMARY];

; WITH
    -- Tally table Gen            Tally Rows:     X2                X3
t1 AS (SELECT 1 N UNION ALL SELECT 1 N),    -- 4            ,    8
t2 AS (SELECT 1 N FROM t1 x, t1 y),            -- 16            ,    64
t3 AS (SELECT 1 N FROM t2 x, t2 y),            -- 256            ,    4096
t4 AS (SELECT 1 N FROM t3 x, t3 y),            -- 65536        ,    16,777,216
t5 AS (SELECT 1 N FROM t4 x, t4 y),            -- 4,294,967,296,    A lot
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
          FROM t5 x, t5 y)

INSERT INTO #MyTable 
SELECT N, CAST(N/RAND(N/8) AS bigINT)/5 , N%2
FROM Tally
WHERE N < 500000

Next we run three different types of queries to review the performance (turn on "Actual Execution Plan" if you are using SQL Server Management Studio):

SET STATISTICS IO ON
SET STATISTICS TIME ON
----- Try #1 
select  'T1' AS Qry, id, business_key, 
        result
from    #myTable
where   id in 
        (select max(id)
        from    #myTable
        group by business_key)

---- Try #2 
select 'T2' AS Qry, id, business_key, 
       result
    from 
    (select id, 
        business_key, 
        result, 
        max(id) over (partition by business_key) as max_id
    from #mytable) x
where id = max_id

---- Try #3 
;with cteRowNumber as (
    select id, 
        business_key, 
        result,
           row_number() over(partition by business_key order by id desc) as RowNum
        from #mytable
)

SELECT 'T3' AS Qry, id, business_key, 
       result
FROM cteRowNumber
WHERE RowNum = 1

Cleanup:

IF OBJECT_ID(N'TempDB..#myTable',N'U') IS NOT NULL 
    DROP TABLE #myTable;
    SET STATISTICS IO OFF
SET STATISTICS TIME OFF

You will find, looking at the execution plans, "Try 1" has the best "Query Cost" and lowest CPU time but "Try 3" has the least reads and CPU time is not too bad. I would recommend using a CTE method for the least reads

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow