performant query using analytic function to select records with 2 date columns
-
07-06-2021 - |
Pergunta
I'm looking for a perfomant way to write a SQL query.
I've got a table with columns (id
,fname
,lname
,accountid
,creation_date
,update_date
) and i have to look for the records in that table that have the same fname
,lname
,accountid
and have the most recent date based on greatest(max(creation_date),max(update_date))
(note that update_date
can be null)
I expect that I need to use analytic functions
I have these cases:
(id,fname,lname,accountid,creation_date,update_date)
(1,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(2,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(3,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
I want to choose the last inserted : this record (3,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(id,fname,lname,accountid,creation_date,update_date)
(3,'b','a','2','07/01/2009 10:59:43','07/01/2010 10:59:43')
(4,'b','a','2','07/01/2011 10:59:43',null)
(5,'b','a','2','07/01/2009 10:59:43','07/01/2009 10:59:43')
I want to choose the most recent one on both columns (creation_date,update_date) which is (4,'b','a','2','07/01/201110:59:43',null)
(id,fname,lname,accountid,creation_date,update_date)
(6,'c','g','4','07/01/2010 10:59:43',null)
(7,'c','g','4','07/01/2011 10:59:43',null)
(8,'c','g','4','07/01/2009 10:59:43',null)
I want to choose the most recent one on both columns (creation_date,update_date) which is (7,'c','g','4','07/01/2011 10:59:43',null)
(id,fname,lname,accountid,creation_date,update_date)
(9,'k','t','2','07/01/2009 10:59:43','07/01/2012 10:59:43')
(10,'k','t','2','07/01/2011 10:59:43',null)
(11,'k','t','2','07/01/2009 10:59:43','07/01/2009 10:59:43')
I want to choose the most recent one on both columns (creation_date,update_date) which is (9,'k','t','2','07/01/2009 10:59:43','07/01/2012 10:59:43')
Solução
You should use the analytic functions rank()
or row_number()
. My own particular preference is toward rank()
but it only really works well if are partitioning by a unique index. Something like the following, assuming there is a unique index on fname, lname, accountid, creation_date
select *
from ( select a.*
, rank() over ( partition by fname, lname, accountid
order by creation_date desc
, update_date desc ) as rnk
from my_table a )
where rnk = 1
This orders each combination of fname, lname, accountid
by creation_date
, update_date
. Using where rnk = 1
, enables you to then select the maximum creation_date, update_date
.
Outras dicas
It sounds like you want something like
SELECT *
FROM (SELECT a.*,
rank() over (partition by fname, lname, accountid
order by coalesce( update_date, creation_date ) desc,
id desc) rnk
FROM your_table)
WHERE rnk = 1
- I'm guessing a bit at why you would want the row with an
id
of 3 returned rather than the rows with anid
of 1 or 2 since all 3 rows have the sameupdate_date
andcreation_date
. My guess is that you want to pick the tie by choosing the row with the largestid
value. If you want to implement a different rule, you'll need to tell us what rule that is. - I am also assuming that the
update_date
, if it exists, will always be greater than or equal to thecreation_date
. It would be odd if a row were updated before it was created.