سؤال

I am looking to design a database that keeps track of every set of changes so that I can refer back to them in the future. So for example:

Database A 

+==========+========+==========+
|   ID     |  Name  | Property |

     1        Kyle      30

If I change the row's 'property' field to 50, it should update the row to:

1    Kyle    50

But should save the fact that the row's property was 30 at some point in time. Then if the row is again updated to be 70:

1    Kyle    70

Both facts that the row's property was 50 and 70 should be preserved, such that with some query I could retrieve:

1    Kyle    30
1    Kyle    50

It should recognize that these were the "same entries" just at different points in time.

Edit: This history will need to be presented to the user at some point in time so ideally, there should be an understanding of which rows belong to the same "revision cluster"

What is the best way to approach the design of this database?

هل كانت مفيدة؟

المحلول

One way is to have a MyTableNameHistory for every table in your database, and make its schema identical to the schema of table MyTableName, except that the Primary Key of the History table has one additional column named effectiveUtc as a DateTime. For example, if you have a table named Employee,

Create Table Employee
{
  employeeId integer Primary Key Not Null,
  firstName varChar(20) null,
  lastName varChar(30) Not null,
  HireDate smallDateTime null,
  DepartmentId integer null
}

Then the History table would be

Create Table EmployeeHistory
{
  employeeId integer Not Null,
  effectiveUtc DateTime Not Null,
  firstName varChar(20) null,
  lastName varChar(30) Not null,
  HireDate smallDateTime null,
  DepartmentId integer null,
  Primary Key (employeeId , effectiveUtc)
}

Then, you can put a trigger on Employee table, so that every time you insert, update, or delete anything in the Employee table, a new record is inserted into the EmployeeHistory table with the exact same values for all the regular fields, and current UTC datetime in the effectiveUtc column.

Then to find the values at any point in the past, you just select the record from the history table whose effectiveUtc value is the highest value prior to the asOf datetime you want the value as of.

 Select * from EmployeeHistory h
 Where EmployeeId = @EmployeeId
   And effectiveUtc =
    (Select Max(effectiveUtc)
     From EmployeeHistory 
     Where EmployeeId = h.EmployeeId
        And effcetiveUtc < @AsOfUtcDate) 

نصائح أخرى

To add onto Charles' answer, I would use an Entity-Attribute-Value model instead of creating a different history table for every other table in your database.

Basically, you would create one History table like so:

Create Table History
{
  tableId varChar(64) Not Null,
  recordId varChar(64) Not Null,
  changedAttribute varChar(64) Not Null,
  newValue varChar(64) Not Null,
  effectiveUtc DateTime Not Null,
  Primary Key (tableId , recordId , changedAttribute, effectiveUtc)
}

Then you would create a History record any time you create or modify data in one of your tables.

To follow your example, when you add 'Kyle' to your Employee table, you would create two records (one for each non-id attribute), and then you would create a new record every time a property changes:

History 
+==========+==========+==================+==========+==============+
| tableId  | recordId | changedAttribute | newValue | effectiveUtc |
| Employee | 1        | Name             | Kyle     | N            |
| Employee | 1        | Property         | 30       | N            |
| Employee | 1        | Property         | 50       | N+1          |
| Employee | 1        | Property         | 70       | N+2          |

Alternatively, as a_horse_with_no_name suggested in this comment, if you don't want to store a new History record for every field change, you can store grouped changes (such as changing Name to 'Kyle' and Property to 30 in the same update) as a single record. In this case, you would need to express the collection of changes in JSON or some other blob format. This would merge the changedAttribute and newValue fields into one (changedValues). For example:

History 
+==========+==========+================================+==============+
| tableId  | recordId | changedValues                  | effectiveUtc |
| Employee | 1        | { Name: 'Kyle', Property: 30 } | N            |

This is perhaps more difficult than creating a History table for every other table in your database, but it has multiple benefits:

  • adding new fields to tables in your database won't require adding the same fields to another table
  • fewer tables used
  • It's easier to correlate updates to different tables over time

One architectural benefit of this design is that you are decoupling the concerns of your app and your history/audit capabilities. This design would work just as well as a microservice using a relational or even NoSQL database that is separate from your application database.

The best way depends on what you're doing. You want to look more deeply into slowly changing dimensions:

https://en.wikipedia.org/wiki/Slowly_changing_dimension

In Postgres 9.2 don't miss the tsrange type, too. It allows to merge start_date and end_date into a single column, and to index the stuff with a GIST (or GIN) index alongside an exclude constraint to avoid overlapping date ranges.


Edit:

there should be an understanding of which rows belong to the same "revision cluster"

In this case you want date ranges in a way or another in your tables, rather than revision numbers or live flags, else you'll end up duplicating related data all over the place.

On a separate note, consider discriminating the audit tables from the live data, rather than storing everything in the same table. It's harder to implement and manage, but it makes for far more efficient queries on the live data.


See this related post, too: Temporal database design, with a twist (live vs draft rows)

One of the ways to log all the changes is to create so called audit triggers. Such triggers can log any change to the table they are on to a separate log table (that can be queried to see the history of the changes).

Details on the implementation here.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top