Transaction Time Tables in MSSQL and Constraints

https://dba.stackexchange.com/questions/281352

12-03-2021
|

Question

I am trying to implement transaction time tables on a MSSQL Server 2016 following the book by Richard Snodgrass (Developing Time-Oriented Database Applications in SQL).

A table with a sequenced primary key through a trigger could be implemented like this:

CREATE TABLE test_table (
a [tinyint] NOT NULL,
b [date] NOT NULL,
c [float] NOT NULL,
[tt_start] [datetime2](7) NOT NULL,
[tt_end] [datetime2](7) NOT NULL)
GO

CREATE TRIGGER Seq_Primary_Key_tt ON test_table FOR INSERT, UPDATE, DELETE AS
  BEGIN
    IF (( EXISTS ( SELECT * FROM test_table AS b1
    WHERE 1 < (SELECT COUNT(b2.a) FROM
        test_table AS b2
        WHERE b1.a = b2.a AND
        b1.b = b2.b AND
        b1.tt_start < b2.tt_end AND b2.tt_start < b1.tt_end) )))
    BEGIN
        RAISERROR('Transaction violates sequenced constraint', 1, 2)
        ROLLBACK TRANSACTION
    END
 END

 GO

The trigger is pretty slow on larger tables. Therefore, using multiple INSERTs leads to long waiting times. Furthermore, many operations like updates require more than one operation (e.g. INSERT, then UPDATE). This doesn't work with this trigger, because the trigger would start already after the INSERT and fail, although the operation would perfectly work out if the trigger would start after the UPDATE.

That is why Snodgrass writes that the constraints/assertions (or here the trigger) have to be DEFERRABLE INITIALLY DEFERRED. In that way, the constraints would be checked after all operations have finished. This would at the same time also increase the performance of multiple INSERTs.

However, as far as I see, MSSQL Server doesn't implement DEFERRABLE INITIALLY DEFERRED. How could a similar constraint or trigger be then implemented? Or could the "temporal tables" feature be a replacement for what I am trying to do?

La solution

From the comments, your requirement is:

For a point in time, only one row can be valid.

To enforce this, we only need the point in time where the row was changed/will become effective.

It's a misconception that in order to query and maintain the integrity of time-dependent data that we must have a close/end date¹. Unless we are defining the duration of a contract or true interval, it is not needed² and requires a good deal of transactional logic to ensure invalid rows are not inserted.

For your case, it looks like columns A and B form a composite primary key. Working from this pattern we would set things up like so:

/* Need an entity to maintain the parent key for any time-independent relationships */
CREATE TABLE TestEntity
(
  ColumnA  TINYINT  NOT NULL
 ,ColumnB  DATE     NOT NULL
  /* Any immutable columns would go here */
 ,CONSTRAINT PK_TestEntity PRIMARY KEY (ColumnA, ColumnB)
)
GO

CREATE TABLE TestEntityVersion
(
  ColumnA    TINYINT       NOT NULL
 ,ColumnB    DATE          NOT NULL
 ,tt_start   DATETIME2(7)  NOT NULL
 ,ColumnC    FLOAT         NOT NULL
 ,CONSTRAINT FK_TestEntityVersion_VersionOf_TestEntity FOREIGN KEY (ColumnA, ColumnB) REFERENCES TestEntity (ColumnA, ColumnB)
 ,CONSTRAINT PK_TestEntityVersion PRIMARY KEY (ColumnA, ColumnB, tt_start)
)
GO

To get a full picture of the entity as of a point in time, we would use the following query:

SELECT
  E.ColumnA
 ,E.ColumnB
 ,EV.tt_start
 ,EV.ColumnC
FROM
  TestEntity E
LEFT JOIN
  TestEntityVersion EV
    ON EV.ColumnA = E.ColumnA
        AND EV.ColumnB = E.ColumnB
        AND EV.tt_start=
          (
            SELECT
              MAX(tt_start)
            FROM
              TestEntityVersion
            WHERE
              ColumnA = E.ColumnA
                AND ColumnB = E.ColumnB
                AND tt_start <= '2020-12-12 13:56:23.2352342'
          )

The primary key guarantees that one one row will be returned. No additional constraints/triggers/functions are required.

For older versions of SQL Server (I believe 2014 and earlier) the above query will result in two seeks against TestEntityVersion (although the data is usually one read once from disk). Newer versions will only perform one seek, returning the most recent row with the TOP operator. In either case, I've found the performance to be acceptable as long as the data is properly normalized and the tables are kept narrow.

If people really, really, really have to have an end date, you should derive it and use it for display purposes only. This is easily done with a windowing function and can be incorporated into a view:

  SELECT
    ColumnA
   ,ColumnB
   ,tt_start
   ,LEAD(tt_start,1,'9999-12-31 23:59:59.9999999')
      OVER 
       (
         Partition BY 
           ColumnA
          ,ColumnB 
         ORDER BY 
           tt_start
       ) AS tt_end
   ,ColumnC
  FROM
    EntityVersion

Additional Considerations

There may be additional requirements than might be necessary to impose, such as:

Don't insert the row if the values don't change from the prior version
Don't insert a row if a row with a greater datetime value exists

Usually these can be handled through your stored procedure logic, but if you are going to have several procedures inserting rows into the table, you can build a function and use that to enforce the constraint. Anchor Modeling has some good examples of those (although the rest of that particular use pattern is awful).

These answers on SO may also be of benefit to you, as they were to me:

Storing time-series data, relational or non?

Historical / auditable database

¹ It's sort of baffling to me that book authors who have tried to tackle this subject (including Hugh Darwen and CJ Date, sadly) have done such a poor job conceptualizing this particular problem space, forcing everything into a mindset of intervals instead of points-in-time. Results in a lot of unnecessary work to ensure intervals are contiguous and do not overlap. Implementation of their "solutions" always result in subpar query performance and unnecessary overhead to inserts.

² For this case, we would write incorporate the rules into our insert/update procedures.

Autres conseils

The trigger is pretty slow on larger tables.

You don't have any indexes, and you aren't limiting the check to the rows affected by the statement using the INSERTED virtual table. See: Use the inserted and deleted tables

many operations like updates require more than one operation

In scenarios like this you're better-off managing the updates through a stored procedure, rather than trying to handle everything with a trigger. For instance if you insert an adjoining interval, you'd probably want to coalesce them.

Licencié sous: CC-BY-SA avec attribution

Non affilié à dba.stackexchange