mysql insert race condition

https://stackoverflow.com/questions/264807

06-07-2019
|

Question

How do you stop race conditions in MySQL? the problem at hand is caused by a simple algorithm:

select a row from table
if it doesn't exist, insert it

and then either you get a duplicate row, or if you prevent it via unique/primary keys, an error.

Now normally I'd think transactions help here, but because the row doesn't exist, the transaction don't actually help (or am I missing something?).

LOCK TABLE sounds like an overkill, especially if the table is updated multiple times per second.

The only other solution I can think of is GET_LOCK() for every different id, but isn't there a better way? Are there no scalability issues here as well? And also, doing it for every table sounds a bit unnatural, as it sounds like a very common problem in high-concurrency databases to me.

Solution

what you want is LOCK TABLES

or if that seems excessive how about INSERT IGNORE with a check that the row was actually inserted.

If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead.

OTHER TIPS

It seems to me you should have a unique index on your id column, so a repeated insert would trigger an error instead of being blindingly accepted again.

That can be done by defining the id as a primary key or using a unique index by itself.

I think the first question you need to ask is why do you have many threads doing the exact SAME work? Why would they have to insert the exact same row?

After that being answered, I think that just ignoring the errors will be the most performant solution, but measure both approaches (GET_LOCK v/s ignore errors) and see for yourself.

There is no other way that I know of. Why do you want to avoid errors? You still have to code for the case when another type of error occurs.

As staticsan says transactions do help but, as they usually are implied, if two inserts are ran by different threads, they will both be inside an implied transactions and see consistent views of the database.

Locking the entire table is indeed overkill. To get the effect that you want, you need something that the litterature calls "predicate locks". No one has ever seen those except printed on the paper that academic studies are published on. The next best thing are locks on the "access paths" to the data (in some DBMS's : "page locks").

Some non-SQL systems allow you to do both (1) and (2) in one single statement, more or less meaning the potential race conditions arising from your OS suspending your execution thread right between (1) and (2), are entirely eliminated.

Nevertheless, in the absence of predicate locks such systems will still need to resort to some kind of locking scheme, and the finer the "granularity" (/"scope") of the locks it takes, the better for concurrency.

(And to conclude : some DBMS's - especially the ones you don't have to pay for - do indeed offer no finer lock granularity than "the entire table".)

On a technical level, a transaction will help here because other threads won't see the new row until you commit the transaction.

But in practice that doesn't solve the problem - it only moves it. Your application now needs to check whether the commit fails and decide what to do. I would normally have it rollback what you did, and restart the transaction because now the row will be visible. This is how transaction-based programmer is supposed to work.

I ran into the same problem and searched the Net for a moment :)

Finally I came up with solution similar to method to creating filesystem objects in shared (temporary) directories to securely open temporary files:

$exists = $success = false;
do{
 $exists = check();// select a row in the table 
 if (!$exists)
  $success = create_record();
  if ($success){
   $exists = true;
  }else if ($success != ERROR_DUP_ROW){
    log_error("failed to create row not 'coz DUP_ROW!");
    break;
  }else{
    //probably other process has already created the record,
    //so try check again if exists
  }
}while(!$exists)

Don't be afraid of busy-loop - normally it will execute once or twice.

You prevent duplicate rows very simply by putting unique indexes on your tables. That has nothing to do with LOCKS or TRANSACTIONS.

Do you care if an insert fails because it's a duplicate? Do you need to be notified if it fails? Or is all that matters that the row was inserted, and it doesn't matter by whom or how many duplicates inserts failed?

If you don't care, then all you need is INSERT IGNORE. There is no need to think about transactions or table locks at all.

InnoDB has row level locking automatically, but that applies only to updates and deletes. You are right that it does not apply to inserts. You can't lock what doesn't yet exist!

You can explicitly LOCK the entire table. But if your purpose is to prevent duplicates, then you are doing it wrong. Again, use a unique index.

If there is a set of changes to be made and you want an all-or-nothing result (or even a set of all-or-nothing results within a larger all-or-nothing result), then use transactions and savepoints. Then use ROLLBACK or ROLLBACK TO SAVEPOINT *savepoint_name* to undo changes, including deletes, updates and inserts.

LOCK tables is not a replacement for transactions, but it is your only option with MyISAM tables, which do not support transactions. You can also use it with InnoDB tables if row-level level locking isn't enough. See this page for more information on using transactions with lock table statements.

I have a similar issue. I have a table that under most circumstances should have a unique ticket_id value, but there are some cases where I will have duplicates; not the best design, but it is what it is.

User A checks to see if the ticket is reserved, it isn't
User B checks to see if the ticket is reserved, it isn't
User B inserts a 'reserved' record into the table for that ticket
User A inserts a 'reserved' record into the table for that ticket
User B check for duplicate? Yes, is my record newer? Yes, leave it
User A check for duplicate? Yes, is my record newer? No, delete it

User B has reserved the ticket, User A reports back that the ticket has been taken by someone else.

The key in my instance is that you need a tie-breaker, in my case it's the auto-increment id on the row.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow