Question

I've learned a decent bit about database integrity, and know I should be using transactions if I "require multiple statements be performed as a unit to keep the data in a consistent state." Database development mistakes made by application developers (point 16, chosen answer)

Wikipedia uses the example:

  1. Debit $100 to Groceries Expense Account
  2. Credit $100 to Checking Account

If I try to credit a non-existent account ID, and I'm using constraints properly, an exception will be thrown and I can catch it and roll back. If there is a power outage these two changes are guaranteed to be atomic.


However, if I understand properly, transactions by themselves won't help me in all cases: (example with PHP and MySQL)

  1. MySQL: Start transaction
  2. MySQL: Select data from a table
  3. PHP: Compute state with the selected data
    • PHP: If the state is valid, insert data
    • PHP: Otherwise, don't insert data
  4. MySQL: Commit transaction

This won't work because the queries can be executed together atomically without failing (it's PHP that decides that there's an error, not some SQL constraint).

Secondly, and I just tested, transactions are committed synchronously, but can be started asynchronously. If I start a transaction, and add a 10 second delay, I can start the slow script, and start and commit another transaction in that time, demonstrating concurrent transactions. Two instances can select the same data, before seeing the other's modifications. Only the modifications are guaranteed to be atomic.


So what can I do? I suppose locking a table works, but is that good practice? Some conditions can be described with SQL in a single statement, but more complex ones can't.

Was it helpful?

Solution

This is a good question. Shows that you've been thinking about it a bit.

The problem you are describing exists because the database is not aware of your data dependencies. To the database, your code selects some data and writes some data. It doesn't know you are only writing that data based on the data selected. In general, you need to tell the database about your data dependencies. This is done differently in each database.

You mentioned MySQL. InnoDB has support for SELECT ... FOR UPDATE. This will issue a lock for the resource so that other queries cannot access the resource (depending on transaction isolation level). This will make the second transaction in your example not be able to execute until the first one commits, if they are locking the same resources. Which resources it locks is up to the database.

Let's look at an example. To lock the rows, you would first create a transaction and query the database with something like:

select * from tableA where value > 50 for update

This select will lock these rows so that incompatible locks will be blocked. Then you can do the processing in PHP. Once you are ready, you can insert rows into another table:

insert into tableB values ('some value')

At this point, before you commit, all of these rows will be locked. None of these rows will be available to other clients. Thus, throughout your whole transaction, no other client will be able to read any of the rows you've touched unless they read uncommitted. To make this work in your example, you just need to make sure all your select statements in 2 are using select for update.

The other way to do this is to tell the database on the update statement. When you issue the update statement, you also tell specify what you think the data should be. If the database does update some rows, then you can be sure that nothing else has changed your data. If you don't update the expected number of rows, you can know that someone else has changed your data, and you should handle the exception. This is optimistic concurrency where you guess that probably no one will update your data, so you do your change. Afterwards, you can check to see if someone actually did.

The query would be like:

select value from table where id = '1'

then later:

update table set value = 'new value' where id = '1' and value = 'old value'

Other databases give you other options on these two basic ideas. For example, on the optimistic model, you can verify a timestamp (or autoincrement) value instead of the actual values.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top