Question

It is not clear the strategy Postgres takes when columns are set to NULL:

UPDATE tbl SET
  col1 = NULL,
  col2 = NULL
WHERE created < current_date - INTERVAL '1 year';

Documentation https://www.postgresql.org/docs/current/mvcc.html is a bit lengthy and techy so I cannot reliably deduce:

if setting to NULL is performed in place or affected rows/pages are copied?

Looks like any UPDATE should create new row for MVCC semantic but what if setting to NULL is a special case?

For GDPR conformance I think to null every personal historical data and I try to understand implications of massive periodic UPDATE SET x = NULL. Should I think of VACUUM after that?

Was it helpful?

Solution

PostgreSQL never performs an UPDATE by modifying the existing data in place. If you set columns to NULL, a new row version will be created just as with any other UPDATE, and the previous row versions will remain until VACUUM reclaims them.

But be warned that

  • VACUUM will only delete the old row version if there is no long running transaction that still might need old data.
  • VACUUM will not overwrite the data, so the old value will still be on disk until the space is reused.

Concerning the GDPR, the wording is:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay

The term “erasure” is nowhere defined in that law, so it is subject to interpretation. My bet is that few enough people understand the inner workings of PostgreSQL well enough to contest that DELETE is erasure. And it would take a data forensics expert with advanced PostgreSQL knowledge to retrieve such data. Once VACUUM has run, it is nigh impossible to do that. If I were called to court as an expert witness, I would say that anybody who has run DELETE in the database has taken all possible steps to erase the data.

If you feel paranoid, schedule a regular VACUUM on the table in question, and make sure that you have no long running transactions. Any worry beyond that is silly.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top