Is it safe to rely on transactions in Firebird 2.5 Embedded DB in case of power outage?

https://dba.stackexchange.com/questions/278782

09-03-2021
|

Question

How it's safe to say that the following list of action will never be reflected in DB in case of power outage somewhere in the middle of #2 line, before transaction is committed?

#1 begin transaction
#2 delete record in table A that cascade deletes another record in table B
#3 update another record in table C
#4 commit transaction

La solution

That is usually safe. Firebird uses an MVCC (Multi-Version Concurrency Control) architecture. Updates (and deletes) are written as a back version and a new version. The new version overwrites the previous version, and the back version is a delta to recreate the previous record version from the new record version. Deletes are written as a 'stub' record version that marks the record as deleted.

Each version is marked with the transaction that originally created the record version. If that transaction is not committed, that version of the record will not be visible. Firebird will follow the chain of back version pointers of the record to find the version that is visible to the transaction and reconstruct that version.

After a crash or other abrupt termination, Firebird will at some point detect the transaction is not active, and the transaction will be marked as rolled back. At that time, or at some later point during garbage collection, Firebird will rewrite the record versions to make the latest committed version the 'new' version, and eliminate unnecessary back versions.

The fact that you have a cascading delete in your example isn't really relevant, as long as the consistency of individual records is OK.

How can a transaction go from Active to Rolled Back if it exits abnormally?

This can happen in one of two ways.

When a transaction starts, it takes out a lock on its own transaction id. If a transaction (B) attempts to update or delete a record and finds that the most recent version of the record was created by a transaction (A) whose TIP state is ACTIVE, transaction B tries to get a conflicting lock on A's transaction id. A live transaction maintains an exclusive lock on its own id, and the lock manager can probe a lock to see if the owner is still alive. If the lock is granted, then B knows that A died and changes A's TIP state from Active to Rolled Back.

When a transaction starts, it checks to see if it can get an exclusive lock on the database - if it can no other transactions are active. Every active transaction has a shared lock on the database. If it gets an exclusive lock, it converts all Active TIP entries to Rolled Back.

(from: Firebird for the Database Expert: Episode 4 - OAT, OIT, & Sweep)

To prevent dataloss, Firebird employs what is called a "careful write" strategy to ensure on-disk consistency, by ensuring data is written in an order that maintains the dependencies between data pages (and versions).

The new record is written in the same place as the old record, while the back version is written to reconstruct the old record. There are basically two scenarios:

The new version and its back version fit on the same data page.

Both new and back version are written to the in-memory image of the page, and the page is written to disk.

In this situation, the only real 'failure' mode would be that the disk write itself is only partially done when the power fails. This problem can be addressed by using disk controllers with a backup battery unit (BBU) or similar solution.
The back version doesn't fit on the same page, and has to go on another data page.

The back version and new version are written to the in-memory image of their respective pages. As the new version depends on the back version (it points to the back version), the page with the back version is written to disk first, then the page with the new version is written to disk.

If the process terminates after the back version is written, the back version is orphaned, but the previous version of the record is still intact. It also has the same failure mode as the previous one.

So, in case of a power outage, or other type of termination of the Firebird process, this means that you possibly have orphaned pages (pages allocated but not yet registered, or pages released but not yet marked as available), or orphaned records (a back version that is written, but not yet correctly linked in a version chain of records). These orphaned pages or back versions waste space, but do not affect consistency. It might be necessary to use gfix to fix and reclaim space wasted that way.

That said, besides having failures mid-write and no BBU, I believe there are scenarios where a power outage leaves the database in a state where it first needs to be repaired using gfix, but I haven't encountered those scenarios myself and I can't really think of one right now.

This all assumes that you have synchronous writes (forced writes) enabled on your Firebird database (the default). If you disable synchronous writes, then writing pages to disk is left to your OS, and this might result in data-loss as pages might be written in a different order, for example the new version has been written to disk, but the back version not yet at the time of the outage, then you have essentially lost the committed version of the row, orphaned previous back versions, and the back version pointer of the new version points to garbage.