Question

When designing a database to use MVCC (Multi-Version Concurrency Control), you create tables with either a boolean field like "IsLatest" or an integer "VersionId", and you never do any updates, you only insert new records when things change.

MVCC gives you automatic auditing for applications that require a detailed history, and it also relieves pressure on the database with regards to update locks. The cons are that it makes your data size much bigger and slows down selects, due to the extra clause necessary to get the latest version. It also makes foreign keys more complicated.

(Note that I'm not talking about the native MVCC support in RDBMSs like SQL Server's snapshot isolation level)

This has been discussed in other posts here on Stack Overflow. [todo - links]

I am wondering, which of the prevalent entity/ORM frameworks (Linq to Sql, ADO.NET EF, Hibernate, etc) can cleanly support this type of design? This is a major change to the typical ActiveRecord design pattern, so I'm not sure if the majority of tools that are out there could help someone who decides to go this route with their data model. I'm particularly interested in how foreign keys would be handled, because I'm not even sure of the best way to data model them to support MVCC.

Was it helpful?

Solution

I might consider implementing the MVCC tier purely in the DB, using stored procs and views to handle my data operations. Then you could present a reasonable API to any ORM that was capable of mapping to and from stored procs, and you could let the DB deal with the data integrity issues (since it's pretty much build for that). If you went this way, you might want to look at a more pure Mapping solution like IBatis or IBatis.net.

OTHER TIPS

I designed a database similarly (only INSERTs — no UPDATEs, no DELETEs).

Almost all of my SELECT queries were against views of only the current rows for each table (highest revision number).

The views looked like this…

SELECT
    dbo.tblBook.BookId,
    dbo.tblBook.RevisionId,
    dbo.tblBook.Title,
    dbo.tblBook.AuthorId,
    dbo.tblBook.Price,
    dbo.tblBook.Deleted
FROM
    dbo.tblBook INNER JOIN
    (
        SELECT
            BookId,
            MAX(RevisionId) AS RevisionId
        FROM
            dbo.tblBook
        GROUP BY
            BookId
    ) AS CurrentBookRevision ON
    dbo.tblBook.BookId = CurrentBookRevision.BookId AND
    dbo.tblBook.RevisionId = CurrentBookRevision.RevisionId
WHERE
    dbo.tblBook.Deleted = 0

And my inserts (and updates and deletes) were all handled by stored procedures (one per table).

The stored procedures looked like this…

ALTER procedure [dbo].[sp_Book_CreateUpdateDelete]
    @BookId      uniqueidentifier,
    @RevisionId  bigint,
    @Title       varchar(256),
    @AuthorId    uniqueidentifier,
    @Price       smallmoney,
    @Deleted     bit
as
    insert into tblBook
        (
            BookId,
            RevisionId,
            Title,
            AuthorId,
            Price,
            Deleted
        )
    values
        (
            @BookId,
            @RevisionId,
            @Title,
            @AuthorId,
            @Price,
            @Deleted
        )

Revision numbers were handled per-transaction in the Visual Basic code…

Shared Sub Save(ByVal UserId As Guid, ByVal Explanation As String, ByVal Commands As Collections.Generic.Queue(Of SqlCommand))
    Dim Connection As SqlConnection = New SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings("Connection").ConnectionString)
    Connection.Open()
    Dim Transaction As SqlTransaction = Connection.BeginTransaction
    Try
        Dim RevisionId As Integer = Nothing
        Dim RevisionCommand As SqlCommand = New SqlCommand("sp_Revision_Create", Connection)
        RevisionCommand.CommandType = CommandType.StoredProcedure
        RevisionCommand.Parameters.AddWithValue("@RevisionId", 0)
        RevisionCommand.Parameters(0).SqlDbType = SqlDbType.BigInt
        RevisionCommand.Parameters(0).Direction = ParameterDirection.Output
        RevisionCommand.Parameters.AddWithValue("@UserId", UserId)
        RevisionCommand.Parameters.AddWithValue("@Explanation", Explanation)
        RevisionCommand.Transaction = Transaction
        LogDatabaseActivity(RevisionCommand)
        If RevisionCommand.ExecuteNonQuery() = 1 Then 'rows inserted
            RevisionId = CInt(RevisionCommand.Parameters(0).Value) 'generated key
        Else
            Throw New Exception("Zero rows affected.")
        End If
        For Each Command As SqlCommand In Commands
            Command.Connection = Connection
            Command.Transaction = Transaction
            Command.CommandType = CommandType.StoredProcedure
            Command.Parameters.AddWithValue("@RevisionId", RevisionId)
            LogDatabaseActivity(Command)
            If Command.ExecuteNonQuery() < 1 Then 'rows inserted
                Throw New Exception("Zero rows affected.")
            End If
        Next
        Transaction.Commit()
    Catch ex As Exception
        Transaction.Rollback()
        Throw New Exception("Rolled back transaction", ex)
    Finally
        Connection.Close()
    End Try
End Sub

I created an object for each table, each with constructors, instance properties and methods, create-update-delete commands, a bunch of finder functions, and IComparable sorting functions. It was a huge amount of code.

One-to-one DB table to VB object...

Public Class Book
    Implements iComparable

#Region " Constructors "

    Private _BookId As Guid
    Private _RevisionId As Integer
    Private _Title As String
    Private _AuthorId As Guid
    Private _Price As Decimal
    Private _Deleted As Boolean

    ...

    Sub New(ByVal BookRow As DataRow)
        Try
            _BookId = New Guid(BookRow("BookId").ToString)
            _RevisionId = CInt(BookRow("RevisionId"))
            _Title = CStr(BookRow("Title"))
            _AuthorId = New Guid(BookRow("AuthorId").ToString)
            _Price = CDec(BookRow("Price"))
        Catch ex As Exception
            'TO DO: log exception
            Throw New Exception("DataRow does not contain valid Book data.", ex)
        End Try
    End Sub

#End Region

...

#Region " Create, Update & Delete "

    Function Save() As SqlCommand
        If _BookId = Guid.Empty Then
            _BookId = Guid.NewGuid()
        End If
        Dim Command As SqlCommand = New SqlCommand("sp_Book_CreateUpdateDelete")
        Command.Parameters.AddWithValue("@BookId", _BookId)
        Command.Parameters.AddWithValue("@Title", _Title)
        Command.Parameters.AddWithValue("@AuthorId", _AuthorId)
        Command.Parameters.AddWithValue("@Price", _Price)
        Command.Parameters.AddWithValue("@Deleted", _Deleted)
        Return Command
    End Function

    Shared Function Delete(ByVal BookId As Guid) As SqlCommand
        Dim Doomed As Book = FindByBookId(BookId)
        Doomed.Deleted = True
        Return Doomed.Save()
    End Function

    ...

#End Region

...

#Region " Finders "

    Shared Function FindByBookId(ByVal BookId As Guid, Optional ByVal TryDeleted As Boolean = False) As Book
        Dim Command As SqlCommand
        If TryDeleted Then
            Command = New SqlCommand("sp_Book_FindByBookIdTryDeleted")
        Else
            Command = New SqlCommand("sp_Book_FindByBookId")
        End If
        Command.Parameters.AddWithValue("@BookId", BookId)
        If Database.Find(Command).Rows.Count > 0 Then
            Return New Book(Database.Find(Command).Rows(0))
        Else
            Return Nothing
        End If
    End Function

Such a system preserves all past versions of each row, but can be a real pain to manage.

PROS:

  • Total history preserved
  • Fewer stored procedures

CONS:

  • relies on non-database application for data integrity
  • huge amount of code to be written
  • No foreign keys managed within database (goodbye automatic Linq-to-SQL-style object generation)
  • I still haven't come up with a good user interface to retrieve all that preserved past versioning.

CONCLUSION:

  • I wouldn't go to such trouble on a new project without some easy-to-use out-of-the-box ORM solution.

I'm curious if the Microsoft Entity Framework can handle such database designs well.

Jeff and the rest of that Stack Overflow team must have had to deal with similar issues while developing Stack Overflow: Past revisions of edited questions and answers are saved and retrievable.

I believe Jeff has stated that his team used Linq to SQL and MS SQL Server.

I wonder how they handled these issues.

To the best of my knowledge, ORM frameworks are going to want to generate the CRUD code for you, so they would have to be explicitly designed to implement a MVCC option; I don't know of any that do so out of the box.

From an Entity framework standpoint, CSLA doesn't implement persistence for you at all -- it just defines a "Data Adapter" interface that you use to implement whatever persistence you need. So you could set up code generation (CodeSmith, etc.) templates to auto-generate CRUD logic for your CSLA entities that go along with a MVCC database architecture.

This approach would work with any entity framework, most likely, not just CSLA, but it would be a very "clean" implementation in CSLA.

Check out the Envers project - works nice with JPA/Hibernate applications and basically does that for you - keeps track of different versions of each Entity in another table and gives you SVN-like possibilities ("Gimme the version of Person being used 2008-11-05...")

http://www.jboss.org/envers/

/Jens

I always figured you'd use a db trigger on update and delete to push those rows out into a TableName_Audit table.

That'd work with ORMs, give you your history and wouldn't decimate select performance on that table. Is that a good idea or am I missing something?

What we do, is just use a normal ORM ( hibernate ) and handle the MVCC with views + instead of triggers.

So, there is a v_emp view, which just looks like a normal table, you can insert and update into it fine, when you do this though, the triggers handle actually inserting the correct data into the base table.

Not.. I hate this method :) I'd go with a stored procedure API as suggested by Tim.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top