Deletion / invalidation approaches for reference data

https://stackoverflow.com/questions/5369271

27-10-2019
|

Question

Based on the discussion I found here: Database: To delete or not to delete records, I want to focus on reference data in particular, add a few thoughts on that, and ask for your preferred approach in general, or based on which criteria you make the decision which of the approaches available you go for.

Let's assume the following data structure for a 'request database' for customers, whereas requests may be delivered via various channels (phone, mail, fax, ..; our 'reference data table I want to mainly focus on'):

Request (ID, Text, Channel_ID)
Channel(ID, Description)

Let's, for the beginning, assume the following data within those two tables:

Request:

ID    | Text                                         | Channel_ID
===============================================================  
1     | How much is product A currently?             | 1 
2     | What about my inquiry from 2011/02/13?       | 1
3     | Did you receive my payment from 2011/03/04?  | 2

Channel:

ID    | Description
===============================================================  
1     | Phone
2     | Mail
3     | Fax

So, how do you attack this assuming the following requirements:

Channels may change over time. That means: Their descriptions may change. New ones may be added, only valid starting from some particular data. Channels may be invalidated (by some particular date)
For reporting and monitoring purposes, it needs to be possibly to identify using which channel a request was originally filed.
For new requests, only the currently 'valid' channels should be allowed, whereas for pre-existing ones, also the channels that were valid at that particular date should be allowed.

In my understanding, that clearly asks for a richer invalidation approach that goes beyond a deletion flag, probably something incorporating a 'ValidFrom / ValidTo' approach for the reference data table.

On the other hand, this involves several difficulties during data capture of requests, because for new requests, you only display they currently available channels, whereas for maintenance of pre-existing ones, all channels available as of the creation of this record need to be displayed. This might not only be complicated from a development point of view, but may also be non-intuitive to the users.

How do you commonly set up your data model for reference data that might chance over time? How do you create your user interface then? Which further parameters do you take into account for proper database design?

Solution

In such cases I usually create another table, for example, channel_versions that duplicates all fields from channel and has extra create_date column(and it's own PK of course). For channel I define after insert/update triggers that copy new values into channel_versions. Now all requests from Request table refer to records from channel_versions. For new requests you need to get the most recent version of channel from channel_versions . For old requests you always know how channel looked when the request was fulfilled.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow