Question

Should dates for a temporal database stored in one or 2 tables ? If one doesn't this violate normalisation ?

PERSON1 DATE11 DATE21 INFO11 INFO21 DEPRECATED
PERSON2 DATE21 DATE22 INFO21 INFO22 CURRENT
PERSON1 DATE31 DATE32 INFO31 INFO32 CURRENT

DATE1 and DATE2 Columns indicate that INFO1 and INFO2 are true for the period between DATE1 and DATE2. If DATE < TODAY, the facts are deprecated and shouldn't show any more in the user interface but they shouldn't be deleted for historical purpose. For example INFO11 and INFO21 are now deprecated.

Should I split this table ? Should I store the state (deprecated or current) in the table ?

To clarify the question further more, Deprecated is the term used by the Business, if you prefer "not current", the problem is not semantic, it's not about sql queries either, I just want to know which design violates or best suits Normalisation rules (I know normalisation is not always the way to go, that is not my question either).

Was it helpful?

Solution

"I want to know which design violates Normalisation rules"

Depends on which set of normalization rules you want to go by.

The first and most likely violation of normal forms, and in Date's book it is a violation of first NF, is your end-dates in the rows that hold "current" information (making abstraction of the possibility of future-dated information): you violate 1NF if you make that attribute nullable.

Violations of BCNF may obviously occur as a consequence of your choice of keys (as it is the case in nontemporal database designs too - the temporal aspect makes no difference here). Wrt "choice of keys": if you use separate start- and end-dates (and SQL kind of leaves you no other choice), then most likely you should declare TWO keys: one that includes the start date, and one that includes the end-date.

Another design issue is the multiple data columns. This issue is discussed quite at large in "Temporal Data and the Relational Model" : if INFO1 and INFO2 can change independently of one another, it might be better to decompose your tables to hold just one attribute, in order to avoid an "explosion of rows count" that might otherwise occur if you have to create a new complete row every time one single attribute in the row changes. In that case, your design as you gave it constitutes a violation of SIXTH normal form, as (that normal form is) defined in "Temporal Data and the Relational Model".

OTHER TIPS

Normalization is a Relational database concept - it does not apply as well to temporal databases. That's not to say that you cannot store temporal data in a relational database. You definitely can.

But if you are going with Temporal Database Design, then the concepts of Temporal Normalization apply rather than Relational normalization.

You have not indicated the meaning of the dates. Do they refer to (a) the period when the stated fact was true in real-life, or (b) to the period when the stated fact was believed to be true by the holder of the database ? If (b), then I would never do it this way. Move the updated line to an archive table/log immediately when the update is done. If (a), then the following statement is questionable :

"the facts are deprecated and shouldn't show any more in the user interface"

If a fact doesn't "need to show up in the user interface" anymore, then it doesn't need to be in the database anymore either. Keeping such facts there achieves only one thing : deteriorate general performance for all the rest.

If you really need these historical statements of fact to suit your requirements, then chances are that your so-called "deprecated facts" are still very much relevant to the business, and therefore not "deprecated" at all. Assumming that for this reason, there are very little "genuinely deprecated" facts in your database, your design is good. Just keep the number of "genuinely deprecated facts" small by periodically removing them from the operational database.

(PS) To say that your design is good, doesn't mean you won't run into any problems. SQL is extremely ill-suited to handle this kind of information elegantly. "Temporal Data and the Relational Model" is an excellent treatment of the subject. Another book, the one from Snodgrass, is often praised too, though not by me. That one is something of a cookbook with recipes for dealing with these problems in SQL, as proven by the following conversation on SO about this book :

(Q) "Why would I read that ?" (A) "Because the trigger you asked for is on page 135."

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top