Postgres: how to delete entries in multiple tables

https://stackoverflow.com/questions/12893746

07-07-2021
|

Question

I have a task that to implement a 'rollback' (not the usual rollback) function for a batch of entries from different tables. For example:

def rollback(cursor, entries):
    # entries is a dict of such form: 
    # {'table_name1': [id1, id2, ...], 'table_name2': [id1, id2, ...], ...}

I need to delete entries in each table_name. But because these entries may have relationship between so a bit complex. My idea is in several steps:

Find out all columns from all tables that are nullable.
Update all entries set all columns that are nullable to null. After this step there should be no circular depends (if not, i think they can't be insert into the table)
Find out their depends and make a topological sort.
Delete one by one.

My questions are:

Does the idea make sense?
Has anyone done something similar before? And how?
How to query the meta tables for step 3? coz i'm quite new to postgresql.

Any idea and suggestion would be appreciate.

Solution

(1) and (2) are not right. It's quite likely that there will be columns defined NOT NULL REFERENCES othertable(othercol) - there are in any normal schema.

What I think you need to do is to sort the foreign key dependency graph to find an ordering that allows you to DELETE, table-by-table, the data you need to remove. Be aware that circular dependencies are possible due to deferred foreign key constraints, so you need to demote/ignore DEFERRABLE INITIALLY DEFERRED constraints; you can temporarily violate those so long as it's all consistent again at COMMIT time.

Even then you might run into issues. What if a client used SET CONSTRAINTS to make a DEFERRABLE INITIALLY IMMEDIATE constraint DEFERRED during a transaction? You'd then fail to cope with the circular dependency. To handle this your code must [SET CONSTRAINTS ALL DEFERRED] before proceeding.

You will need to look at the information_schema or the PostgreSQL-specific system catalogs to work out the dependencies. It might be worth a look at the pg_dump source code too, since it tries to order dumped tables to avoid dependency conflicts. You'll be particularly interested in the pg_constraint catalog, or its information_schema equivalents information_schema.referential_constraints, information_schema.constraint_table_usage and information_schema.constraint_column_usage.

You can use the either the information_schema or pg_catalog. Don't use both. information_schema is SQL-standard and more portable, but can be slow to query and doesn't have all the information pg_catalog contains. On the flip side, pg_catalog's schema isn't guaranteed to remain compatible across major versions (like 9.1 to 9.2) - though it generally does - and its use isn't portable.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow