Find the greatest difference between each unique record with different timestamps

https://dba.stackexchange.com/questions/278849

09-03-2021
|

Question

I have a PostgreSQL table with the following schema and data:

CREATE TABLE IF NOT EXISTS T(
     id uuid PRIMARY KEY,
     username varchar(15),
     person varchar(10),
     tweets int,
     followers int,
     following int,
     likes int,
     created_at date) 
;
 
 id                                   | username     | person     | tweets | followers | following | likes | created_at
 :----------------------------------- | :----------- | :--------- | -----: | --------: | --------: | ----: | :---------
 3fa34100-d688-4051-a687-ec49d05e7212 | renok        | null       |    110 |         6 |         0 |     0 | 2020-10-10
 bab9ceb9-2770-49ea-8489-77e5d763a223 | Lydia_C      | test user2 |    515 |      1301 |      1852 |  1677 | 2020-10-10
 4649077a-9188-4821-a1ec-3b38608ea44a | Kingston_Sav | null       |   2730 |      1087 |      1082 |  1339 | 2020-10-10
 eef80836-e140-4adc-9598-8b612ab1825b | TP_s         | null       |   1835 |       998 |       956 |  1832 | 2020-10-10
 fd3ff8c7-0994-40b6-abe0-915368ab9ae5 | DKSnr4       | null       |    580 |       268 |       705 |   703 | 2020-10-10
 3fa34100-d688-4051-a687-ec49d05e7312 | renok        | null       |    119 |         6 |         0 |     0 | 2020-10-12
 bab9ceb9-2770-49ea-8489-77e5d763a224 | Lydia_C      | test user2 |    516 |      1301 |      1852 |  1687 | 2020-10-12
 4649077a-9188-4821-a1ec-3b38608ea44B | Kingston_Sav | null       |   2737 |      1090 |      1084 |  1342 | 2020-10-12
 eef80836-e140-4adc-9598-8b612ae1835c | TP_s         | null       |   1833 |       998 |       957 |  1837 | 2020-10-12
 fd3ff8c7-0994-40b6-abe0-915368ab7ab5 | DKSnr4       | null       |    570 |       268 |       700 |   703 | 2020-10-12

I intend to get the biggest difference between the most recent date and the next most recent date for each unique username and the find the username with the largest margin (difference) for example..In the above table the most recent date is 2020-10-12 and the next most recent date is 2020-10-10.

So I want to get something like this

 id                                   | username     | person | tweets | followers | following | likes | created_at | prev_followers | gain
 :----------------------------------- | :----------- | :----- | -----: | --------: | --------: | ----: | :--------- | -------------: | ---:
 4649077a-9188-4821-a1ec-3b38608ea44a | Kingston_Sav | null   |   2737 |      1090 |      1084 |  1342 | 2020-10-12 |           1087 |    3

Solution

Many ways lead to Rome. The below should be a good one (fast and flexible) to "find the username with the largest margin"

Assuming all involved columns are defined NOT NULL. And each username can only have one entry per day. Else you have to do more.

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
        , d1.followers, username  -- added tiebreaker
LIMIT  1;

The CTE named cte attaches rank numbers with dense_rank() (not rank(), not row_number()). Then join the latest day (rnk = 1) with the one before (rnk = 2) and calculate the gain. Obviously, users must have entries for both days to qualify. Finally order by the gain and take the first row.

Note the added ORDER BY expressions to try and break possible ties: there can be multiple users with the same gain, so you have to define how to deal with that. One way is to add tiebreakes. In my example, a user with a smaller absolute number of followers is preferred (higher relative gain), and the alphabetically first wins if that's still ambiguous.

Or you return all "winners":

Again, many ways ... Postgres 13 added the standard SQL clause WITH TIES for that purpose exactly:

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
FETCH  FIRST 1 ROWS WITH TIES;

db<>fiddle here

Detailed explanation for WITH TIES:

Greater than or equal to ALL() and equal to MAX() speed

OTHER TIPS

WITH 
cte1 AS ( SELECT DISTINCT created_at
          FROM t
          ORDER BY 1 DESC LIMIT 2 ),
cte2 AS ( SELECT src.*, 
                 LEAD(src.followers) OVER (PARTITION BY src.id 
                                           ORDER BY src.created_at DESC) prev_followers,
                 src.followers - LEAD(src.followers) OVER (PARTITION BY src.id 
                                                           ORDER BY src.created_at DESC) gain
          FROM t src
          JOIN cte1 ON src.created_at = cte1.created_at )
SELECT *
FROM cte2
WHERE gain IS NOT NULL
ORDER BY gain DESC LIMIT 1;

fiddle

whenever there are two records with the same difference margin..It always shows one instead of two.

WITH 
cte1 AS ( SELECT DISTINCT created_at
          FROM t
          ORDER BY 1 DESC LIMIT 2 ),
cte2 AS ( SELECT src.*, 
                 LEAD(src.followers) OVER (PARTITION BY src.id 
                                           ORDER BY src.created_at DESC) prev_followers,
                 src.followers - LEAD(src.followers) OVER (PARTITION BY src.id 
                                                           ORDER BY src.created_at DESC) gain
          FROM t src
          JOIN cte1 ON src.created_at = cte1.created_at ),
cte3 AS ( SELECT *, RANK() OVER (ORDER BY gain DESC) rnk
          FROM cte2
          WHERE gain IS NOT NULL )
SELECT *
FROM cte3
WHERE rnk = 1;

fiddle

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange