Delete all rows but one with the greatest value per group

https://stackoverflow.com/questions/22210878

09-06-2023
|

Question

So, I just recently asked a question: Update using a subquery with aggregates and groupby in Postgres
and it turns out I was going about my issue with flawed logic.

In the same scenario in the question above, instead of updating all the rows to have the max quantity, I'd like to delete the rows that don't have the max quantity (and any duplicate max quantities).

Essentially I need to just convert the below to a delete statement that preserves only the largest quantities per item_name. I'm guessing I'm going to need NOT EXISTS here but I'm not sure how to do that with aggregate functions.

UPDATE transaction t
SET    quantity = sub.max_quantity
FROM  (
     SELECT item_name, max(quantity) AS max_quantity
     FROM   transaction
     GROUP  BY 1
) sub
WHERE t.item_name = sub.item_name
AND   t.quantity IS DISTINCT FROM sub.max_quantity;

Solution

Since there can be peers sharing the same maximum quantity, the safe route is a subquery with the window function row_number():

DELETE FROM transaction t
USING (
   SELECT some_unique_id, row_number() OVER (PARTITION BY item_name
                                             ORDER BY quantity DESC) AS rn
   FROM   transaction
   GROUP  BY 1
   ) sub
WHERE t.some_unique_id = sub.some_unique_id
AND   sub.rn > 1;

Where some_unique_id can be any unique column or combination of columns (mirrored in the GROUP BY clause).

Ends up to be very similar to this question from today:
Delete rows with duplicates on two fields

If your table is big and you are going to delete large parts of it, consider advanced advice here:
How to delete duplicate entries?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow