Question

My application makes Web Service requests; there is a max rate of requests the provider will handle, so I need to throttle them down.

When the app ran on a single server, I used to do it at the application level: an object that keeps track of how many requests have been made so far, and waits if the current request makes it exceeds the maximum allowed load.

Now, we're migrating from a single server to a cluster, so there are two copies of the application running.

  • I can't keep checking for the max load at the application code, because the two nodes combined might exceed the allowed load.
  • I can't simply reduce the load on each server, because if the other node is idle, the first node can send out more requests.

This is a JavaEE 5 environment. What is the best way to throttle the requests the application sends out ?

Was it helpful?

Solution

Since you are already in a Java EE environment, you can create an MDB that handles all requests to the webservice based on a JMS queue. The instances of the application can simply post their requests to the queue and the MDB will recieve them and call the webservice.

The queue can actually be configured with the appropriate number of sessions that will limit the concurrent access to you webservice, thus your throttling is handled via the queue config.

The results can be returned via another queue (or even a queue per application instance).

OTHER TIPS

Many ways of doing this: you might have a "Coordination Agent" which is responsible of handing "tokens" to the servers. Each "token" represents a permission to perform a task etc. Each application needs to request "tokens" in order to place calls.

Once an application depletes its tokens, it must ask for some more before proceeding to hit the Web Service again.

Of course, this all gets complicated when there are requirements with regards to the timing of each calls each application makes because of concurrency towards the Web Service.

You could rely on RabbitMQ as Messaging framework: Java bindings are available.

I recommend using beanstalkd to periodically pump a collection of requests (jobs) into a tube (queue), each with an appropriate delay. Any number of "worker" threads or processes will wait for the next request to be available, and if a worker finishes early it can pick up the next request. The down side is that there isn't any explicit load balancing between workers, but I have found that distribution of requests out of the queue has been well balanced.

The N nodes need to communicate. There are various strategies:

  • broadcast: each node will broadcast to everybody else that it's macking a call, and all other nodes will take that into account. Nodes are equal and maintain individial global count (each node know about every other node's call).
  • master node: one node is special, its the master and all other nodes ask permission from the master before making a call. The master is the only one that know the global count.
  • dedicated master: same as master, but the 'master' doesn't do calls on itslef, is just a service that keep track of calls.

Depending on how high do you anticipate to scale later, one or the other strategy may be best. For 2 nodes the simplest one is broadcast, but as the number of nodes increases the problems start to mount (you'll be spending more time broadcasting and responding to broadcats than actually doing WS requests).

How the nodes communicate, is up to you. You can open a TCP pipe, you can broadcats UDP, you can do a fully fledged WS for this purpose alone, you can use a file share protocol. Whatever you do, you are now no longer inside a process so all the fallacies of distributed computing apply.

This is an interesting problem, and the difficulty of the solution depends to a degree on how strict you want to be on the throttling.

My usual solution to this is JBossCache, partly because it comes packaged with JBoss AppServer, but also because it handles the task rather well. You can use it as a kind of distributed hashmap, recording the usage statistics at various degrees of granularity. Updates to it can be done asynchronously, so it doesn't slow things down.

JBossCache is usually used for heavy-duty distributed caching, but I rather like it for these lighter-weight jobs too. It's pure java, and requires no mucking about with the JVM (unlike Terracotta).

Hystrix was designed for pretty much the exact scenario you're describing. You can define a thread pool size for each service so you have a set maximum number of concurrent requests, and it queues up requests when the pool is full. You can also define a timeout for each service and when a service starts exceeding its timeout, Hystrix will reject further requests to that service for a short period of time in order to give the service a chance to get back on its feet. There's also real time monitoring of the entire cluster through Turbine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top