History table implementation: “Tuple-versioning” vs Effective Date
سؤال
I'm designing a database that will have history tables (SCD Type 4 I guess) for auditing purpose.
The question is what are the advantages of having tuple-versioning (start and end dates) vs just effective date?
1) tuple versioning
CREATE TABLE HistoryTable (
Column1 Type1,
: :
Columnn Typen,
StartDate DATETIME,
EndDate DATETIME
)
2) effective date
CREATE TABLE HistoryTable (
Column1 Type1,
: :
Columnn Typen,
EffectiveDate DATETIME
)
For (1) so far I see disadvantages - I have to update [end date] on the previous history record while inserting new/current one.
For (2) I can only insert new record without searching and modification the previous one.
From query point of view both approaches look quite the same.
I know that SQL Server 2016 has temporal tables but it's not available to us so far.
المحلول
Tuple versioning
Will require two updates for each new piece of data - one to set the end date of the old row & one to insert the new row.
The index(es) most likely will contain the interval dates making them wider and marginally less efficient.
Historical queries (AS OF < date >) are simpler.
DELETE can be a logical delete without further code.
The end of the live row must be marked with a "magic" value, typically 9999-12-31, or NULL. The latter will complicate code. The former often morphs into several magic values which become difficult to handle.
Effective date
Will require a MAX(), TOP(1) or similar functionality, on every single read.
Adding a single DATE column to indexes will cause a tiny less bit of bloat.
Logical DELETE is not possible without adding a place-holder "has expired" row to the table.
For both
All writes are INSERTS, never UPDATES.
Primary keys expand to include the date, with implications for referencing tables.
Best to use UTC for the effective dates to avoid time zone and DST issues.
I've tried both. Tuple versioning is easier to support overall.
It is relatively simple to reproduce the functionality of SQL Server's temporal tables by using triggers. Having a separate history table eliminates a lot of the complexities arising from having history and live values co-mingled in one table.