Can I update old data and insert new data if not exists currently in a single query

https://dba.stackexchange.com/questions/266846

01-03-2021
|

Question

I was trying to wrap my head around how to go about doing an update to a table if an updated list of the same table type does not have one or more of its original rows. The list of services comes from an API and is usually the same set of services, however every now and again the list may be updated and services become inactive and no longer show up in the results from the API call so these services should be updated in our database to reflect active as FALSE. Likewise is a new service comes onboard it should be added to the current list of available services. I had in mind to just drop all records and add the new ones fetched from the API however considering I have used the Ids for the existing services in other tables and I would still need to reference them I threw that idea out the window and now I am in a bit of a bind.

Services - (Currently in DB)
Name | Id | Active
Test1   3     true
Test2   4     true
Test3   5     true

I wanted to have a query or trigger of some sort to run when trying to insert duplicate data to table Services where constrained by the following:

If an existing 'Name' is found skip the insert and move on to the next item in the array
If a new 'Name' comes up not found in the DB add it as a new row. Example [Test4 5 true]
If the newly fetched list from an API does not have one of the existing 'Name's, that is Test1,Test2 or Test3 update that existing row to set the Active column to false. So if the new list does not have Test3 the existing Test3 row would be updated to show Active as false.

Solution

I think you will want something like the following - see the fiddle here. It's based on Common Table Expressions (CTEs) and that fact that with PostgreSQL, you can perform not only SELECTs, but also INSERTs, UPDATEs and DELETEs (see here also).

First, your service table:

CREATE TABLE service
(
  name VARCHAR (10) NOT NULL PRIMARY KEY,
  id INTEGER NOT NULL,
  active BOOLEAN NOT NULL
);

populate it with your data:

INSERT INTO service VALUES 
('Test1', 3, true), ('Test2', 4, true), ('Test3', 5, true);

Now, you receive your data from your API - I'll assume that you put that into some sort of temporary table - the keyword TEMPORARY just means that the table will be dropped at the end of your session - I've tested with both TEMPORARY and normal tables with the fiddle and the results are the same, so we'll go with TEMPORARY:

CREATE TEMPORARY TABLE api
(
  name VARCHAR (10) NOT NULL PRIMARY KEY,
  id INTEGER NOT NULL
  -- active BOOLEAN NOT NULL 
);

I've assumed that your API doesn't know the status of the service, so it only has two fields - the name and the name and the id.

Populate it:

INSERT INTO api VALUES
('Test2', 4), ('Test3', 5), ('Test4', 6);

Notice that service Test1 is missing and that service Test4 is an additional service.

So now, because of PostgreSQL's ability to perform INSERTs and UPDATEs within CTEs, we can now do the following:

WITH cte1 (nom) AS 
(
  INSERT INTO service  (name, id, active) 
    SELECT a.name, a.id, true FROM api a 
    WHERE a.name NOT IN (SELECT name FROM service) RETURNING name
),
cte2 (nom2) AS
(
  UPDATE service s SET active = false 
    WHERE s.name NOT IN (SELECT name FROM api) RETURNING s.name 
)
SELECT * FROM service;

The first CTE

INSERT INTO service  (name, id, active) 
  SELECT a.name, a.id, true FROM api a 
  WHERE a.name NOT IN (SELECT name FROM service) RETURNING name

inserts new services from the api table into the service table and the second:

UPDATE service s SET active = false 
  WHERE s.name NOT IN (SELECT name FROM api) RETURNING s.name

sets the service status = false where a service in the service table isn't present in the api table.

Now, the result of the SELECT * FROM service at the end of this query is:

name    id  active
Test1   3   t
Test2   4   t
Test3   5   t

So, you might think "Drat, it hasn't worked!" - but in fact, it has worked!

In the next section, you rerun

SELECT * FROM service 
ORDER BY name;

and you get:

name    id  active
Test1   3   f
Test2   4   t
Test3   5   t
Test4   6   t

So, we can see that service Test1's active field has been set to false and that service Test4 has been added. The reason it doesn't show up in the SELECT immediately after the CTEs has to do with the scope of the transaction - the first SELECT shows the service table as it was at the beginning of the transaction - the second shows the state of the table after the transaction.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange