What is a good strategy for updating lot of rows in postgersql?
-
07-03-2021 - |
Question
I have an inventory
table which contains item_id
and the quantity
remaining of the item (And also some other meta data). I have an administrator update the inventory by uploading a CSV file which contains the item_id
and the quantity remaining.
- Run `update for each row in the CSV file. If my CSV file contains 1 million rows, I will end up sending 1 million update statements from my application server to the database server.
- Construct 1 million update queries and send them in a batch (JDBC allows batched statements)
At first glance approach number 2 looks like a better solution. But then can 1 million statements be batched? What happens if one of the statement fails for some reason?
Solution
The usual way is to import the CSV file into a staging table. Either an unlogged
table which you only create once, or a temp table which you create immediately before the import.
Something along the lines:
create temp table inventory_import (item_id integer primary key, quantity integer);
copy inventory_import from '/path/to/file.csv' ... ;
update inventory i
set quantity = im.quantity
from inventory_import im
where i.item_id = im.item_id;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange