Question

I've been working on this for a few days now, and I've found several solutions but none of them incredibly simple or lightweight. The problem is basically this: We have a cluster of 10 machines, each of which is running the same software on a multithreaded ESB platform. I can deal with concurrency issues between threads on the same machine fairly easily, but what about concurrency on the same data on different machines?

Essentially the software receives requests to feed a customer's data from one business to another via web services. However, the customer may or may not exist yet on the other system. If it does not, we create it via a web service method. So it requires a sort of test-and-set, but I need a semaphore of some sort to lock out the other machines from causing race conditions. I've had situations before where a remote customer was created twice for a single local customer, which isn't really desirable.

Solutions I've toyed with conceptually are:

  1. Using our fault-tolerant shared file system to create "lock" files which will be checked for by each machine depending on the customer

  2. Using a special table in our database, and locking the whole table in order to do a "test-and-set" for a lock record.

  3. Using Terracotta, an open source server software which assists in scaling, but uses a hub-and-spoke model.

  4. Using EHCache for synchronous replication of my in-memory "locks."

I can't imagine that I'm the only person who's ever had this kind of problem. How did you solve it? Did you cook something up in-house or do you have a favorite 3rd-party product?

Was it helpful?

Solution

you might want to consider using Hazelcast distributed locks. Super lite and easy.

java.util.concurrent.locks.Lock lock = Hazelcast.getLock ("mymonitor");
lock.lock ();
try {
// do your stuff
}finally {
   lock.unlock();
}

Hazelcast - Distributed Queue, Map, Set, List, Lock

OTHER TIPS

We use Terracotta, so I would like to vote for that.

I've been following Hazelcast and it looks like another promising technology, but can't vote for it since I've not used it, and knowing that it uses a P2P based system at its heard, I really would not trust it for large scaling needs.

But I have also heard of Zookeeper, which came out of Yahoo, and is moving under the Hadoop umbrella. If you're adventurous trying out some new technology this really has lots of promise since it's very lean and mean, focusing on just coordination. I like the vision and promise, though it might be too green still.

Terracotta is closer to a "tiered" model - all client applications talk to a Terracotta Server Array (and more importantly for scale they don't talk to one another). The Terracotta Server Array is capable of being clustered for both scale and availability (mirrored, for availability, and striped, for scale).

In any case as you probably know Terracotta gives you the ability to express concurrency across the cluster the same way you do in a single JVM by using POJO synchronized/wait/notify or by using any of the java.util.concurrent primitives such as ReentrantReadWriteLock, CyclicBarrier, AtomicLong, FutureTask and so on.

There are a lot of simple recipes demonstrating the use of these primitives in the Terracotta Cookbook.

As an example, I will post the ReentrantReadWriteLock example (note there is no "Terracotta" version of the lock - you just use normal Java ReentrantReadWriteLock)

import java.util.concurrent.locks.*;

public class Main
{
    public static final Main instance = new Main();
    private int counter = 0;
    private ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(true);

    public void read()
    {
        while (true) {
            rwl.readLock().lock();
                try {
                System.out.println("Counter is " + counter);
            } finally {
                rwl.readLock().unlock();
            }
            try { Thread.currentThread().sleep(1000); } catch (InterruptedException ie) {  }
        }
    }

    public void write()
    {
        while (true) {
            rwl.writeLock().lock();
            try {
               counter++;
               System.out.println("Incrementing counter.  Counter is " + counter);
            } finally {
                 rwl.writeLock().unlock();
            }
            try { Thread.currentThread().sleep(3000); } catch (InterruptedException ie) {  }
        }
    }

    public static void main(String[] args)
    {
        if (args.length > 0)  {
            // args --> Writer
            instance.write();
        } else {
            // no args --> Reader
            instance.read();
        }
    }
}

I recommend to use Redisson. It implements over 30 distributed data structures and services including java.util.Lock. Usage example:

Config config = new Config();
config.addAddress("some.server.com:8291");
Redisson redisson = Redisson.create(config);

Lock lock = redisson.getLock("anyLock");
lock.lock();
try {
    ...
} finally {
   lock.unlock();
}

redisson.shutdown();

I was going to advice on using memcached as a very fast, distributed RAM storage for keeping logs; but it seems that EHCache is a similar project but more java-centric.

Either one is the way to go, as long as you're sure to use atomic updates (memcached supports them, don't know about EHCache). It's by far the most scalable solution.

As a related datapoint, Google uses 'Chubby', a fast, RAM-based distributed lock storage as the root of several systems, among them BigTable.

I have done a lot of work with Coherence, which allowed several approaches to implementing a distributed lock. The naive approach was to request to lock the same logical object on all participating nodes. In Coherence terms this was locking a key on a Replicated Cache. This approach doesn't scale that well because the network traffic increases linearly as you add nodes. A smarter way was to use a Distributed Cache, where each node in the cluster is naturally responsible for a portion of the key space, so locking a key in such a cache always involved communication with at most one node. You could roll your own approach based on this idea, or better still, get Coherence. It really is the scalability toolkit of your dreams.

I would add that any half decent multi-node network based locking mechanism would have to be reasonably sophisticated to act correctly in the event of any network failure.

Not sure if I understand the entire context but it sounds like you have 1 single database backing this? Why not make use of the database's locking: if creating the customer is a single INSERT then this statement alone can serve as a lock since the database will reject a second INSERT that would violate one of your constraints (e.g. the fact that the customer name is unique for example).

If the "inserting of a customer" operation is not atomic and is a batch of statements then I would introduce (or use) an initial INSERT that creates some simple basic record identifying your customer (with the necessary UNIQUEness constraints) and then do all the other inserts/updates in the same transaction. Again the database will take care of consistency and any concurrent modifications will result in one of them failing.

I made a simple RMI service with two methods: lock and release. both methods take a key (my data model used UUIDs as pk so that was also the locking key).

RMI is a good solution for this because it's centralized. you can't do this with EJBs (specialially in a cluster as you don't know on which machine your call will land). plus, it's easy.

it worked for me.

If you can set up your load balancing so that requests for a single customer always get mapped to the same server then you can handle this via local synchronization. For example, take your customer ID mod 10 to find which of the 10 nodes to use.

Even if you don't want to do this in the general case your nodes could proxy to each other for this specific type of request.

Assuming your users are uniform enough (i.e. if you have a ton of them) that you don't expect hot spots to pop up where one node gets overloaded, this should still scale pretty well.

You might also consider Cacheonix for distributed locks. Unlike anything else mentioned here Cacheonix support ReadWrite locks with lock escalation from read to write when needed:

ReadWriteLock rwLock = Cacheonix.getInstance().getCluster().getReadWriteLock();
Lock lock = rwLock.getWriteLock();
try {
  ...
} finally {
  lock.unlock();
}

Full disclosure: I am a Cacheonix developer.

Since you are already connecting to a database, before adding another infra piece, take a look at JdbcSemaphore, it is simple to use:

JdbcSemaphore semaphore = new JdbcSemaphore(ds, semName, maxReservations);
boolean acq = semaphore.acquire(acquire, 1, TimeUnit.MINUTES);
if (acq) {
 // do stuff
 semaphore.release();
} else {
  throw new TimeoutException();
}

It is part of spf4j library.

Back in the day, we'd use a specific "lock server" on the network to handle this. Bleh.

Your database server might have resources specifically for doing this kind of thing. MS-SQL Server has application locks usable through the sp_getapplock/sp_releaseapplock procedures.

We have been developing an open source, distributed synchronization framework, currently DistributedReentrantLock and DistributedReentrantReadWrite lock has been implemented, but still are in testing and refactoring phase. In our architecture lock keys are devided in buckets and each node is resonsible for certain number of buckets. So effectively for a successfull lock requests, there is only one network request. We are also using AbstractQueuedSynchronizer class as local lock state, so all the failed lock requests are handled locally, this drastically reduces network trafic. We are using JGroups (http://jgroups.org) for group communication and Hessian for serialization.

for details, please check out http://code.google.com/p/vitrit/.

Please send me your valuable feedback.

Kamran

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top