Soft delete best practices (PHP/MySQL)

https://stackoverflow.com/questions/5020568

14-11-2019
|

Question

Problem

In a web application dealing with products and orders, I want to maintain information and relationships between former employees (users) and the orders they handled. I want to maintain information and relationships between obsolete products and orders which include these products.

However I want employees to be able to de-clutter the administration interfaces, such as removing former employees, obsolete products, obsolete product groups etc.

I'm thinking of implementing soft-deletion. So, how does one usually do this?

My immediate thoughts

My first thought is to stick a "flag_softdeleted TINYINT NOT NULL DEFAULT 0" column in every table of objects that should be soft deletable. Or maybe use a timestamp instead?

Then, I provide a "Show deleted" or "Undelete" button in each relevant GUI. Clicking this button you will include soft-deleted records in the result. Each deleted record has a "Restore" button. Does this make sense?

Your thoughts?

Also, I'd appreciate any links to relevant resources.

Solution

That's how I do it. I have a is_deleted field which defaults to 0. Then queries just check WHERE is_deleted = 0.

I try to stay away from any hard-deletes as much as possible. They are necessary sometimes, but I make that an admin-only feature. That way we can hard-delete, but users can't...

Edit: In fact, you could use this to have multiple "layers" of soft-deletion in your app. So each could be a code:

0 -> Not Deleted
1 -> Soft Deleted, shows up in lists of deleted items for management users
2 -> Soft Deleted, does not show up for any user except admin users
3 -> Only shows up for developers.

Having the other 2 levels will still allow managers and admins to clean up the deleted lists if they get too long. And since the front-end code just checks for is_deleted = 0, it's transparent to the frontend...

OTHER TIPS

Using soft-deletes is a common thing to implement, and they are dead useful for lots of things, like:

Saving a user's data when they deleted something
Saving your own data when you delete something
Keep a track record of what really happened (a kind of audit)
etcetera

There is one thing I want to point out that almost everyone miss, and it always comes back to bite you in the rear piece. The users of your application does not have the same understanding of a delete as you have.

There are different degrees of deletions. The typical user deletes stuff when (s)he

Made a misstake and want to remove the bad data
Doesn't want to see something on the screen anymore

The problem is that if you don't record the intention of the delete, your application cannot distinguish between erronous data (that should never have been created) and historically correct data.

Have a look at the following data:

PRICES | item | price | deleted |
       +------+-------+---------+
       |   A  |  101  |    1    |
       |   B  |  110  |    1    |
       |   C  |  120  |    0    |
       +------+-------+---------+

Some user doesn't want to show the price of item B, since they don't sell that item anymore. So he deletes it. Another user created a price for item A by misstake, so he deleted it and created the price for item C, as intended. Now, can you show me a list of the prices for all products? No, because either you have to display potentially erronous data (A), or you have to exclude all but current prices (C).

Of course the above can be dealt with in any number of ways. My point is that YOU need to be very clear with what YOU mean by a delete, and make sure that there is no way for the users to missunderstand it. One way would be to force the user to make a choice (hide/delete).

If I had existing code that hits that table, I would add the column and change the name of the table. Then I would create a view with the same name as the current table which selects only the active records. That way none of the existing code woudl break and you could have the soft delete column. If you want to see the deleted record, you select from the base table, otherwise you use the view.

I've always just used a deleted column as you mentioned. There's really not much more to it than that. Instead of deleting the record, just set the deleted field to true.

Some components I build allow the user to view all deleted records and restore them, others just display all records where deleted = 0

Your idea does make sense and is used frequently in production but, to implement it you will need to update quite a bit of code to account for the new field. Another option could be to archive (move) the "soft-deleted" records to a separate table or database. This is done frequently as well and makes the issue one of maintenance rather than (re)programming. (You could have a table trigger react to the delete to archive the deleted record.)

I would do the archiving to avoid a major update to production code. But if you want to use deleted-flag field, use it as a timestamp to give you additional useful info beyond a boolean. (Null = not deleted.) You might also want to add a DeletedBy field to track the user responsible for deleting the record. Using two fields gives you a lot of info tells you who deleted what and when. (The two extra field solution is also something that can be done in an archive table/database.)

The most common scenario I've come across is what you describe, a tinyint or even bit representing a status of IsActive or IsDeleted. Depending on whether this is considered "business" or "persistence" data it may be baked into the application/domain logic as transparently as possible, such as directly in stored procedures and not known to the application code. But it sounds like this is legitimate business information for your needs so would need to be known throughout the code. (So users can view deleted records, as you suggest.)

Another approach I've seen is to use a combination of two timestamps to show a "window" of activity for a given record. It's a little more code to maintain it, but the benefit is that something can be scheduled to soft-delete itself at a pre-determined time. Limited-time products can be set that way when they're created, for example. (To make a record active indefinitely one could use a max value (or just some absurdly distant future date) or just have the end date be null if you're ok with that.)

Then of course there's further consideration of things being deleted/undeleted from time to time and tracking some kind of audit for that. The flag approach knows only the current status, the timestamp approach knows only the most recent window. But anything as complex as an audit trail should definitely be stored separately than the records in question.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow