Question

Many projects using multiple transactional data sources may need some kind of distributed transactions across those data sources to get consistent views of the data. What are the most common primitives which transactional data sources provide to allow them to be added to a heterogeneous transactional system?

If you need a specific example, let's say I have a transactional file system which I can make snapshots of, and a transactional database which has a write-ahead log with checkpointing, and they run on different machines. How can I ensure that I get a consistent view of both of them by snapshotting/checkpointing in a coordinated fashion? Is hacking together some form of two-phase commit implemented outside the data sources the normal way to do this, or do the data sources themselves typically provide some APIs to make two-phase commit easier to implement correctly?

Was it helpful?

Solution

So, the term for what I was asking about is apparently XA for "eXtended Architecture," where a transaction might span multiple data sources.

According to the Wikipedia article, the XA standard, which many data stores implement, uses 2-phase commit. I think this is mainly only implemented by databases, and appears to be in Oracle/SQLServer/DB2/PostgreSQL, possibly others. Filesystems don't seem to expose this primitive to user-level applications, although maybe some of them do.

You could also use a different consensus protocol (3-phase commit/Paxos being the most common) but in practice few systems give access to that so 2-phase commit is your best bet.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top