performant query using analytic function to select records with 2 date columns

https://stackoverflow.com/questions/10542226

07-06-2021
|

Question

I'm looking for a perfomant way to write a SQL query.

I've got a table with columns (id,fname,lname,accountid,creation_date,update_date) and i have to look for the records in that table that have the same fname,lname,accountid and have the most recent date based on greatest(max(creation_date),max(update_date)) (note that update_date can be null)

I expect that I need to use analytic functions

I have these cases:

(id,fname,lname,accountid,creation_date,update_date)
(1,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(2,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(3,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')

I want to choose the last inserted : this record (3,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')

(id,fname,lname,accountid,creation_date,update_date)
(3,'b','a','2','07/01/2009 10:59:43','07/01/2010 10:59:43')
(4,'b','a','2','07/01/2011 10:59:43',null)
(5,'b','a','2','07/01/2009 10:59:43','07/01/2009 10:59:43')

I want to choose the most recent one on both columns (creation_date,update_date) which is (4,'b','a','2','07/01/201110:59:43',null)

(id,fname,lname,accountid,creation_date,update_date)
(6,'c','g','4','07/01/2010 10:59:43',null)
(7,'c','g','4','07/01/2011 10:59:43',null)
(8,'c','g','4','07/01/2009 10:59:43',null)

I want to choose the most recent one on both columns (creation_date,update_date) which is (7,'c','g','4','07/01/2011 10:59:43',null)

(id,fname,lname,accountid,creation_date,update_date)
(9,'k','t','2','07/01/2009 10:59:43','07/01/2012 10:59:43')
(10,'k','t','2','07/01/2011 10:59:43',null)
(11,'k','t','2','07/01/2009 10:59:43','07/01/2009 10:59:43')

I want to choose the most recent one on both columns (creation_date,update_date) which is (9,'k','t','2','07/01/2009 10:59:43','07/01/2012 10:59:43')

Solution

You should use the analytic functions rank() or row_number(). My own particular preference is toward rank() but it only really works well if are partitioning by a unique index. Something like the following, assuming there is a unique index on fname, lname, accountid, creation_date

select *
  from ( select a.*
              , rank() over ( partition by fname, lname, accountid 
                      order by creation_date desc
                              , update_date desc ) as rnk
           from my_table a )
 where rnk = 1

This orders each combination of fname, lname, accountid by creation_date, update_date. Using where rnk = 1, enables you to then select the maximum creation_date, update_date.

OTHER TIPS

It sounds like you want something like

SELECT *
  FROM (SELECT a.*,
               rank() over (partition by fname, lname, accountid
                                order by coalesce( update_date, creation_date ) desc,
                                         id desc) rnk
          FROM your_table)
 WHERE rnk = 1

I'm guessing a bit at why you would want the row with an id of 3 returned rather than the rows with an id of 1 or 2 since all 3 rows have the same update_date and creation_date. My guess is that you want to pick the tie by choosing the row with the largest id value. If you want to implement a different rule, you'll need to tell us what rule that is.
I am also assuming that the update_date, if it exists, will always be greater than or equal to the creation_date. It would be odd if a row were updated before it was created.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow