Question

In a Relational Database, what is the best way to handle removing an object from the object graph while still retaining referential integrity? At some point, this has to happen. Either through a soft or hard delete.

For example - when a product is removed, what is the best approach to make sure that the orders containing that product are still relevant, or furthermore that invoices containing orders containing that product are still relevant?

Was it helpful?

Solution

There are basically 3 "standard solutions":

Solution 1

You need the product (like in your case because of the invoices referencing it). This means the data is VALID and the only change is that it goes "out of stock" or "out of portfolio". In any case your business process often will need you to handle RMA situations or some IRS related matters for example... this means the product must not be deleted. This is just a different "state" of the product which needs to be reflected by your DB data model etc.

IF you are concerned with performance do some profiling... if need be you have a multitude of optimization options... these are usually RDBMS-dependent, one technique being "partitioning" - every RDBMS has its own mechanics which differ in flexibility etc.

Solution 2

You don't need any of the data at all... just do a cascaded delete and be done with it...

Solution 3

You only need historical data but no "future business process" will ever need this entity (i.e. product) again... in this case a common solution is to have archive tables which are filled before doing a cascaeded delete on the "active/productive tables". A slight variant of this scheme is copying the needed information into the "dependent rows" (invoice in your case) and just delete the active/productive row (i.e. product in your case).

Conclusion

Complex systems deal with a lot of different business processes/use cases and thus tend to employ all of the above techniques - each has its place depeding on the specific business processes/use cases involved...

OTHER TIPS

Here is an answer I received from an un-named source. I will say this, he is pretty well respected, and to be respectful I am not going to post his name.

I am not going to accept my own answer here, or bypass the bounty, but am just showing his answer.

"With a full-featured RDBMS you can partition the table on the "deleted_or_not" column and that will result in all of the live production rows to be stored compactly. If you don't want deprecated data to show up in reports, simply give the full table an obscure name, such as customers_including_deleted_rows and create a view "customers" (containing only the live rows) from which most of the application code queries. This assumes, of course, that there is some value to having the old data around."

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top