سؤال

I am using MySQL database (with Django ORM). I want to maintain the database audits similar to StackOverflow, Quora, Wikipedia, etc. These websites maintain the revisions of the changes in the database so that the changes made by users/ admins can be reverted.

After going through the database design of StackOverflow and the Quora revisions, I understood two ways of doing this -

StackOverflow

Create a duplicate table to keep the log of the changes made in the database. For each entry record the changes,timestamp and admin/user who made this changes. Use these entries to find diffs and revert to any point. SO keeps the history of revisions in a separate PostHistory table.

Quora

Instead of making separate tables for each table in the database, make a table like this for audits.-

  • id - revision id
  • scope_id - id of database table
  • scope_type - Question, Topic, User
  • item_id - row id of Question/topic/user in the database table
  • event - Edited, Added, Reverted, Removed
  • user_id - who triggered the event
  • timestamp
  • serialized_item_column - serialized data in the json format

Then the serialized data can be used to calculate diffs and revert a particular entry.

In context of crowdsourcing platform like wiki/SO where multiple users/admins can make changes,

  • Which of the two database design will be better ?
  • If I use one duplicate table per table for revisions, i.e. one for the current value and one with all the previous revisions, is that a scalable way for a website with millions of entries and much more revisions ?
هل كانت مفيدة؟

المحلول

Stand on the shoulders of your predecessors.

If I use duplicate tables is that a scalable way for a website with millions of entries and much more revisions ?

"Duplicate tables"? Plan A: one table with all revisions. Plan B: 2 tables, one with the current value, one with all the previous revisions. Plan C (really bad): One table per revision.

Which of the two database design will be better ?

Once you have gone through the exercise of defining "better", you will be halfway to the answer.

How to change the schema in json based audits if schema changes ?

This sounds like a different dimension to the problem -- tracking of schema changes. That, itself, is a very big task. JSON has no problem adding/dropping extra "fields" as a schema changes. However if a table is split, then JSON gets trickier. You would probably need special code bridge the gap.

Bottom Line

Since you seem to be just starting in this endeavor, I suggest you flip a coin to decide which one to use.

But... Plan on revisiting the decision in 6 months. By then you will have enough experience with the code to have some feel for whether it will work for the direction you want to take it.

Sure, it will be painful (very painful) to switch after 6 months. But it will be 'impossible' after 12 months. You will probably decide to patch the chosen schema instead of switching to the other schema.

Or maybe you will spin off another company to use the other database schema. A prediction: You will find that the "grass is not greener" and the second company will fail.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top