Delete duplicate records and update the mapping table
-
04-03-2021 - |
Question
I have store_table_map and item_price_map. I want to delete duplicate records in store_table_map and retain the latest row based on updated_date because i want to put constraints in store_id and item_id. But I also want first to update the item_price_map as seen below before deleting old duplicate records.
Current Table
Expected Result
Here's the script
CREATE TABLE store_table_map (
map_id bigserial NOT NULL,
store_id int8 NOT NULL,
item_id int8 NOT NULL,
updated_date date NOT NULL DEFAULT now()
);
INSERT INTO store_table_map (store_id,item_id,updated_date) VALUES
(1,123,'2020-07-09')
,(1,123,'2020-07-10')
,(2,456,'2020-07-08')
,(2,456,'2020-07-10')
;
CREATE TABLE item_price_table_map (
map_id bigserial NOT NULL,
store_table_map_id int8 NOT NULL,
price float8 NOT NULL,
updated_date date NOT NULL DEFAULT now()
);
INSERT INTO item_price_table_map (store_table_map_id,price,updated_date) VALUES
(1,99.0,'2020-07-09')
,(2,199.0,'2020-07-10')
,(3,299.0,'2020-07-08')
,(4,399.0,'2020-07-10')
;
postgres version: PostgreSQL 9.2.24
La solution
You first need to create the mapping from the old map_id to the new one, this can be done using distinct on()
and a self join:
select m1.map_id old_map_id, m2.new_map_id
from store_table_map m1
join (
select distinct on (store_id, item_id) store_id, item_id, map_id as new_map_id
from store_table_map
order by store_id, item_id, updated_date desc
) m2 on (m1.store_id, m1.item_id) = (m2.store_id, m2.item_id)
The above returns:
old_map_id | new_map_id
-----------+-----------
1 | 2
2 | 2
3 | 4
4 | 4
This can be used in an UPDATE statement:
update item_price_table_map ip
set store_table_map_id = t.new_map_id
from (
select m1.map_id old_map_id, m2.new_map_id
from store_table_map m1
join (
select distinct on (store_id, item_id) store_id, item_id, map_id as new_map_id
from store_table_map
order by store_id, item_id, updated_date desc
) m2 on (m1.store_id, m1.item_id) = (m2.store_id, m2.item_id)
) t
where t.old_map_id = ip.store_table_map_id
and ip.store_table_map_id <> t.new_map_id;
After that you can remove the rows with the now unused map_ids
delete from store_table_map stm
where not exists (select *
from item_price_table_map ip
where ip.store_table_map_id = stm.map_id);
(I could only test this on Postgres 12, but I think it should work on 9.2 too)
Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange