What transactions diminish data integrity?

Question

This is a good question. Shows that you've been thinking about it a bit.

The problem you are describing exists because the database is not aware of your data dependencies. To the database, your code selects some data and writes some data. It doesn't know you are only writing that data based on the data selected. In general, you need to tell the database about your data dependencies. This is done differently in each database.

You mentioned MySQL. InnoDB has support for SELECT ... FOR UPDATE. This will issue a lock for the resource so that other queries cannot access the resource (depending on transaction isolation level). This will make the second transaction in your example not be able to execute until the first one commits, if they are locking the same resources. Which resources it locks is up to the database.

Let's look at an example. To lock the rows, you would first create a transaction and query the database with something like:

select * from tableA where value > 50 for update

This select will lock these rows so that incompatible locks will be blocked. Then you can do the processing in PHP. Once you are ready, you can insert rows into another table:

insert into tableB values ('some value')

At this point, before you commit, all of these rows will be locked. None of these rows will be available to other clients. Thus, throughout your whole transaction, no other client will be able to read any of the rows you've touched unless they read uncommitted. To make this work in your example, you just need to make sure all your select statements in 2 are using select for update.

The other way to do this is to tell the database on the update statement. When you issue the update statement, you also tell specify what you think the data should be. If the database does update some rows, then you can be sure that nothing else has changed your data. If you don't update the expected number of rows, you can know that someone else has changed your data, and you should handle the exception. This is optimistic concurrency where you guess that probably no one will update your data, so you do your change. Afterwards, you can check to see if someone actually did.

The query would be like:

select value from table where id = '1'

then later:

update table set value = 'new value' where id = '1' and value = 'old value'

Other databases give you other options on these two basic ideas. For example, on the optimistic model, you can verify a timestamp (or autoincrement) value instead of the actual values.