Question

I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.

Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.

A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.

In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.

So I was wondering if anyone has come across something similar before and how they approached it?

Update As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)

select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)

Update It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.

Was it helpful?

Solution

There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (created) which tells you when a row was "modified".

The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a select max(version) to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).

The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a replacedBy field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL). Updates are a problem, though, since you must insert a new row and update an existing one.

To solve this, you can add a back pointer (previousVersion). This way, you can insert the new rows and then use the back pointer to update the previous version.

OTHER TIPS

Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf

I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top