Versioning: Is this technology used in DBMS other than spatial?

https://dba.stackexchange.com/questions/168411

06-10-2020
|

Question

ESRI's spatial database management systems, called geodatabases (more), use a technology called versioning.

A version represents a snapshot in time of the entire geodatabase and contains all the datasets in the geodatabase.

Versions are not separate copies of the geodatabase. Instead, versions and the transactions that take place within them are tracked in system tables. This isolates a user's work across multiple edit sessions, allowing users to edit without locking features in the production version or immediately impacting other users and without having to make copies of the data.

^{http://help.arcgis.com/en/geodatabase/10.0/sdk/arcsde/concepts/versioning/basicprinciples/state.htm}

When you register a dataset (a feature class, feature dataset, or table) as versioned, two delta tables are created: the A (or adds) table, which records insertions and updates, and the D (or deletes) table, which stores deletions. Each time you update or delete a record in the dataset, rows are added to one or both of these tables. A versioned dataset, therefore, consists of the original table (referred to as the base or business table) plus any changes in the delta tables. The geodatabase keeps track of which version you were connected to when you made the edits that populated the delta tables. When you query or display a dataset in a version, ArcGIS assembles the relevant rows from the original table and the delta tables to present a seamless view of the data for that version.

To be honest, I find the documentation to be rather vague; it doesn't tell me much about how the technology actually works, or what part of traditional database theory it is based on.

I don't imagine many DBA SE community members would have experience with ESRI's versioning technology. So I won't ask something like 'how does it work?'.

Instead, I'm wondering, are there any technologies in the non-spatial database world that are similar to ESRI versioning?

Solution

Oracle has implemented MMVC almost from its inception. Its main purpose is to reduce contention during updates: make it possible for readers and writers to not block each other. But that is not the same as the multi-versioning offered by ESRI. The traditional database transactions are short transactions: they only last a short time: sub-second, seconds ... maybe minutes or even hours for some batch processes.

The versioning of the kind ESRI offers is for long transactions. Here a transaction can last days, weeks, months ... A database could in theory do this using short transactions: just do not commit, and hope your sessions stays up for days or months. Clearly an impossible guarantee.

So Oracle has implemented its own long-transactions/multi-versioning mechanism. It is calle Oracle Workspace Manager (OWM). See http://docs.oracle.com/database/122/ADWSM/ for details.

In OWM, users work in workspaces (what ESRI calls a version). Workspaces are collaborative: multiple users can work in the same workspace. Updates done in a workspace are only visible to users in that workspace. Changes in a workspace are applied to the parent workspace via a merge operation. A child workspace can start seeing the changes made in its parent via a refresh operation. OWM includes all the mechanics for conflict detection and resolution, long-term locking, change detection, history, "what-if" scenarios ...

Like in ESRI's versions, OWM's workspaces form a hierarchy of nested workspaces, the top one being the LIVE (or "as-built" workspace).

OWM is fully transparent to applications: they interact with tables and views as usual. Referential integrity is maintained within the context of each workspace.

Note that OWM is not specifically related to GIS: the multi-versioned tables can be of any kind - not just "spatial" tables. However, it is used by a number of GIS tools that need a safe long transaction mechanism: Autodesk Map, Bentley Maps, Geomedia Transaction Manager are examples. A number of customers also use it outside of any GIS.

Conceptually it is very much like ESRI's approach: fundamental operations are the same (but may use a different terminology). The implementation is different however: where ESRI keeps changes in separate tables (the "A" and "D" tables), Oracle Workspace Manager keeps them all in the main table(s). Views make the use transparent.

So yes, multi-versioning / long transactions are definitely used in databases in other contexts that spatial databases.

OTHER TIPS

What you are describing seems really similar to MMVC- https://en.wikipedia.org/wiki/Multiversion_concurrency_control Where different clients can have a different "view" or "version" of the same database, and edits done withing a transaction cannot be seen by other sessions until they decide so by executing "COMMIT".

That concept is not exclusive to spatial databases (the wiki cites a 1981 paper as one of the first descriptions of it: https://en.wikipedia.org/wiki/Multiversion_concurrency_control#cite_note-3 ) but the concept has to be older than that. And while I see some special needs and implementations for extra dimensions, the essentials would be the same.

The implementation, of course, can vary widely from engine to engine. Git, even if not a database, it has to implement concurrency control in a particular way, and that will be completely different to how postgres does it. If then we go to distributed databases, we arrive to the wild west of keeping consistency thought heterogenous systems over the net. You want an implementation that is performant for the most useful cases, and all optimizations imply a trade off for operations considered not as important. One that is fast for small transaction sizes may be too expensive or unbearable for large edits. Also, different database engines interpret thing like consistency levels on its own way.

The good news is that with the rise of open-source database systems, you do not need to read large manuals anymore to understand what is happening internally- you can read the source code directly yourself and understand how those are written. For example, for Innodb:

For interest's sake, and in response to @jynus' answer:

There is a component of versioning called versioned views. Versioned views were previously called multiversion views, which is consistent with @jynus' answer about multiversion concurrency control.

A versioned view incorporates a database view, stored procedures, triggers, and functions to allow you to read or edit versioned data in a geodatabase table or feature class using Structured Query Language (SQL). When you query a versioned view, you can see the data in the base (business) table and the edits that are stored in the delta tables. The triggers used by the versioned views update the delta tables when you edit the versioned view using SQL.

Here is what the definition of a versioned view looks like:

SELECT 
    b.OBJECTID
    ,b.A_TEST_FIELD
    ,0 SDE_STATE_ID 
FROM 
    USER1.A_TEST_TABLE b
    ,(SELECT 
         SDE_DELETES_ROW_ID
         ,SDE_STATE_ID 
      FROM 
         USER1.D47430 
      WHERE 
         SDE_STATE_ID = 0 
         AND SDE.version_util.in_current_lineage (DELETED_AT) > 0
       ) d 
WHERE 
    b.OBJECTID = d.SDE_DELETES_ROW_ID(+) 
    AND d.SDE_STATE_ID IS NULL  
    AND SDE.version_util.get_lineage_list > 0 

UNION ALL 

SELECT 
    a.OBJECTID
    ,a.A_TEST_FIELD
    ,a.SDE_STATE_ID 
FROM 
    USER1.A47430 a
    ,(SELECT 
         SDE_DELETES_ROW_ID
         ,SDE_STATE_ID 
      FROM 
         USER1.D47430 
      WHERE 
         SDE.version_util.in_current_lineage (DELETED_AT) > 0
      ) d
 WHERE 
    a.OBJECTID = d.SDE_DELETES_ROW_ID(+) 
    AND a.SDE_STATE_ID = d.SDE_STATE_ID(+) 
    AND SDE.version_util.in_current_lineage (a.SDE_STATE_ID) > 0 
    AND d.SDE_STATE_ID IS NULL

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange