Tombstone Table vs Deleted Flag in database syncronization & soft-delete scenarios

https://dba.stackexchange.com/questions/14402

delete

16-10-2019
|

Question

I need to keep track of deleted items for client synchronization needs.

In general, is it better to add a tombstone table and a trigger that tracks when a row was deleted from the server database - basically adding a new row to the tombstone table with the data from the deleted item - or to keep the items in the original table and flag them as deleted, typically with a column of type bit, to indicate that a row is deleted and another column to track when the delete occurred?

Solution

In general it is better to know the specific requirements and not make design decisions based on what works best in most situations. Either could be preferable. Here are some specifics to gather:

How fast do deletes need to be?
How fast do un-deletes need to be?
How often will deleted data be queried and will it be queried with data that has not been deleted?
How fast do queries of deleted data need to be?
Do you need to preserve only deleted items or changes as well?
Do you need to keep the table/indexes on the primary table small?
What partitioning and/or change tracking technologies are available on the database platform?
How much disk space is available?
Will the deleting occur on the fly or in batch operations?

OTHER TIPS

Maybe you should combine the two methods on purpose. Why ???

Let's use that table (MySQL-dialect)

CREATE TABLE mydata
(
    id int not null auto_increment
    firstname varchar(16) not null,
    lastname varchar(16) not null,
    zipcode char(5) not null,
    ...
    deleted tinyint not null default 0
    KEY (deleted,id),
    KEY (deleted,lastname,firstname,id),
    KEY (deleted,zipcode,id),
    KEY (lastname,firstname),
    KEY (zipcode),
    PRIMARY KEY (id)
);

Please note that, with the exception of the PRIMARY KEY, every index you make should be preceded by the deleted flag and ending with the id.

Let's create the tombstone table

CREATE TABLE mytomb SELECT id FROM mydata WHERE 1=2;
ALTER TABLE mytomb ADD PRIMARY KEY (id);

If your table already has a deleted flag, you could populate the tommstone table

INSERT INTO mytomb SELECT id FROM mydata WHERE deleted = 1;

OK now the data and tombstone are prepped. How do you perform deletes?

Let's say you are deleting every person in the 07305 zipcode. You would run the following:

INSERT IGNORE INTO mytomb SELECT id FROM mydata WHERE deleted=0 AND zipcode='07305';
UPDATE mydata SET deleted=1 WHERE deleted=0 AND zipcode='07305';

OK this seems like a lot of overhead either way you look at it.

Now, do you want to see all the deleted data? Here are two different ways:

SELECT * FROM mydata WHERE deleted=1;
SELECT B.* FROM mytomb A INNER JOIN mydata B USING (id);

If the number of ids in mytomb is greater than 5% of the rowcount of mydata, it is full table scan. Otherwise, an index scan with a lookup for each row. Note any benchmarks in these respects. Lookup the explain plans.

Now, do you want to see every person in zipcode 07304? Here are two different ways:

SELECT * FROM mydata WHERE deleted=1 AND zipcode='07304';
SELECT A.* FROM mydata A LEFT JOIN mytomb B USING (id) WHERE B.id IS NULL AND A.zipcode='07304'

How about mass deletes? Here are two different ways:

DELETE FROM mydata WHERE deleted=1;
DELETE B.* FROM mytomb A INNER JOIN mydata B USING (id); DELETE FROM mytomb;

CONCLUSION

Now, I am not saying to keep both methods. Doing this over time reveals which method is faster in terms of overall operability. You must decide which benchmarks for querying live data, querying deleted data, and mass deletes work best for you.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange