Question

TL;DR - MySQL doesn't let you lock a table and use a transaction at the same time. Is there any way around this?

I have a MySQL table I am using to cache some data from a (slow) external system. The data is used to display web pages (written in PHP.) Every once in a while, when the cached data is deemed too old, one of the web connections should trigger an update of the cached data.

There are three issues I have to deal with:

  • Other clients will try to read the cache data while I am updating it
  • Multiple clients may decide the cache data is too old and try to update it at the same time
  • The PHP instance doing the work may be terminated unexpectedly at any time, and the data should not be corrupted

I can solve the first and last issues by using a transaction, so clients will be able to read the old data until the transaction is committed, when they will immediately see the new data. Any problems will simply cause the transaction to be rolled back.

I can solve the second problem by locking the tables, so that only one process gets a chance to perform the update. By the time any other processes get the lock they will realise they have been beaten to the punch and don't need to update anything.

This means I need to both lock the table and start a transaction. According to the MySQL manual, this is not possible. Starting a transaction releases the locks, and locking a table commits any active transaction.

Is there a way around this, or is there another way entirely to achieve my goal?

Was it helpful?

Solution

If it were me, I'd use the advisory locking function within MySQL to implement a mutex for updating the cache, and a transaction for read isolation. e.g.

begin_transaction(); // although reading a single row doesnt really require this
$cached=runquery("SELECT * FROM cache WHERE key=$id");
end_transaction();

if (is_expired($cached)) {
   $cached=refresh_data($cached, $id);
}
...

function refresh_data($cached, $id)
{
 $lockname=some_deterministic_transform($id);
 if (1==runquery("SELECT GET_LOCK('$lockname',0)") {
    $cached=fetch_source_data($id);
    begin_transaction();
    write_data($cached, $id);
    end_transaction();
    runquery("SELECT RELEASE_LOCK('$lockname')");
 }
 return $cached; 
}

(BTW: bad things may happen if you try this with persistent connections)

OTHER TIPS

This means I need to both lock the table and start a transaction

This is how you can do it:

SET autocommit=0;
LOCK TABLES t1 WRITE, t2 READ, ...;
... do something with tables t1 and t2 here ...
COMMIT;
UNLOCK TABLES;

For more info, see mysql doc

I'd suggest to solve the issue by removing the contention altogether.

Add a timestamp column to your cached data.

When you need to update the cached data:

  • Just add new cached data to your table using the current timestamp
  • Remove cached data older than, let's say, 24 hours.

When you need to serve the cached data

  • Sort by timestamp (DESC) and return the newest cached data

At any given time your clients will retrieve records which are never deleted by any other process. Moreover, you don't care if a client gets cached data belonging to different writes (i.e. with different timestamps)

The second problem may be solved without involving the database at all. Have a lock file for the cache update procedure so that other clients know that someone is already on it. This may not catch each and every corner case, but is it that big of a deal if two clients are updating the cache at the same time? After all, they are doing the update in transactions to the cache will still be consistent.

You may even implement the lock yourself by having the last cache update time stored in a table. When a client wants update the cache, make it lock that table, check the last update time and then update the field.

I.e., implement your own locking mechanism to prevent multiple clients from updating the cache. Transactions will take care of the rest.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top