Question

I have identical Python scripts I need to run on multiple servers all targeting the same table on a DB server. The script takes 5-20 seconds to run, and must run every 5 minutes.

Server1 --->  -------------
              |  DB Table |
Server2 --->  -------------

The script looks at a single table that looks like this:

Type | many other fields | DirtyBit  |  Owner
 --------------------------------------------
  X  | ...               | UnUsed    |   NULL
  X  | ...               | UnUsed    |   NULL
  X  | ...               | UnUsed    |   NULL
  Y  | ...               | UnUsed    |   NULL
  Y  | ...               | UnUsed    |   NULL

The script does the following:

  1. Grab all records of type X (in a transaction) where DirtyBit is UnUsed and Owner is NULL.,

  2. Update all the records, set DirtyBit to InUse, and Owner to Server1.

  3. Perform some operations on the data in Python.

  4. Update all the records according to the operations in 3. Set DirtyBit back to UnUsed, and Owner back to NULL

Because the script is running on multiple servers, the DirtyBit/Owner combination works to ensure the scripts aren't stepping on each other. Also, note that each row in the table is independent of all the others.

Question: is this a sensible approach to getting the scripts to run concurrently? Is there anyway the database can handle this for me (maybe changing the Transaction Isolation Level?). Ideally, I want this, if the scripts happen to run at the same time:

  1. Script on Server 1 starts running.

  2. Script on Server 2 starts running, notices that 1 is running, and thus decides it doesn't need to run.

  3. Script on Server 1 finishes, updates all the data.

Was it helpful?

Solution

Developing solutions that base on a concurrent access and modification of the data is always a very sensible thing. They're also prone to errors that happen very rarely and are hard to find.

In your case, what you want to do is to, in fact, serialize access to your table, not only updates. That is, allow only one thread (transaction) to fetch the data it needs (where DirtyBit is UnUsed and Owner is NULL) and mark those rows as "used". I'm quite sure that your current solution doesn't work properly. Why? Consider such a scenario:

  1. transaction 1 begins
  2. transaction 2 begins
  3. transaction 1 reads the data from table
  4. transaction 2 reads the data from table - it is allowed to in shared lock mode. It reads the same data as transaction 1 did
  5. transaction 1 updates the table
  6. transaction 2 wants to update the table, but it's blocked by transaction 1 - it holds
  7. transaction 1 commits
  8. now transaction 2 may update the data and commit them

As a result both transactions 1 and 2 read the same rows and your script on both servers will operate on them. You may easily reproduce such a scenario manually operating on the database.

You can avoid it explicitly acquiring exclusive table lock. This would look like this:

begin transaction;

select * from test where DirtyBit = 'UnUsed' and Owner is null (TABLOCKX);

update test set DirtyBit = 'Used', Owner = 'Server1' where id in (...);

commit;

Here, the (TABLOCKX) will cause the other transactions to wait until this transaction commits or rollbacks - they will not be able to read the data. Does this solve your problem?

But... if you can avoid concurrency in this specific case, I'd recommend you to do so (because of the first paragraph of my response).

OTHER TIPS

I wouldn't take the approach you've used here. Home-grown solutions like this tend to be brittle.

This looks like a good problem for a scheduled job, with concurrency controlled via sp_getapplock:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top