Version Controlled Database with efficient use of diff

https://stackoverflow.com/questions/746595

09-09-2019
|

Question

I have a project involving a web voting system. The current values and related data is stored in several tables. Historical data will be an important aspect of this project so I've also created Audit Tables to which current data will be moved to on a regular basis.

I find this strategy highly inefficient. Even if I only archive data on a daily basis, the number of rows will become huge even if only 1 or 2 users make updates on a given day.

The next alternative I can think of is only storing entries that have changed. This will mean having to build logic to automatically create a view of a given day. This means less stored rows, but considerable complexity.

My final idea is a bit less conventional. Since the historical data will be for reporting purposes, there's no need for web users to have quick access. I'm thinking that my db could have no historical data in it. DB only represents current state. Then, daily, the entire db could be loaded into objects (number of users/data is relatively low) and then serialized to something like XML or JSON. These files could be diffed with the previous day and stored. In fact, SVN could do this for me. When I want the data for a given past day, the system has to retrieve the version for that day and deserialize into objects. This is obviously a costly operation but performance is not so much a concern here. I'm considering using LINQ for this which I think would simplify things. The serialization procedure would have to be pretty organized for the diff to work well.

Which approach would you take?

Thanks

Solution

All you've told us about your system is that it involves votes. As long as you store timestamps for when votes were cast you should be able to generate a report describing the vote state tally at any point in time... no?

For example, say I have a system that tallies favorite features (eyes, smile, butt, ...). If I want to know how many votes there were for a particular feature as of a particular date, then I would simply tally all the votes for the feature with a timestamp smaller or equal to that date.

If you want to have a history of other things, then you would follow a similar approach.

I think this is the way it is done.

OTHER TIPS

If you're basically wondering how revisions of data are stored in relational databases, then I would look into how wikis do it.

Wikis are all about keeping detailed revision history. They use simple relational databases for storage.

Consider Wikipedia's database schema.

Have you considered using a real version control system rather than trying to shoehorn a database in its place? I myself am quite partial to git, but there are many options. They all have good support for differences between versions, and they tend to be well optimised for this kind of workload.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow