Question

I have one worker role that throws data into around 10 queues that need to be processed. There is a lot of data - probably around 10-100 messages a second that gets queued up in various queues.

The queues hold different data and process them separately. There is a single queue in particular that is very active.

The way I have it setup now, I a separate worker role that spawns 10 different threads, each thread executes a method that has a while(true){get message from queue and process it}. Whenever data in the queue gets backed up we simply launch more of these processes to help speed up the processing of the data from the queue. Also, since one queue is more active, I actually launch a number of threads pointing at the same method to process data from that queue.

However, I am seeing high CPU utilization of the deployment. Almost at or near 100% constantly.

I am wondering if this is because of thread starvation? Or because accessing the queue is RESTful and the threads end up blocking each other via doing the connection and slowing things down? Or, is it because I use:

while(true)
{
   var message = get message from queue;
   if(message != null)
   {
       //process message
   }
}

And that gets executed too fast?

Every processing of the message also saves it to the Azure Table Storage or the DB - so it might be the process of saving this data that is eating up the CPU.

In effect, it's been really hard to debug the high CPU load. So, my question is: are there general architecture changes that I can make that will help alleviate + prevent any possible issue that there might be? (e.g. instead of using while(true) using a different type of polling - although I'd imagine it's the same in the end for that example).

Maybe simply spawning new threads using new Thread() is not the best way to go.

Was it helpful?

Solution

I would suggest putting a sleep statement in your loop... not only is that tight loop probably hogging CPU resources, but you also pay for storage transactions. Every ten thousand times you check the queue, it costs a penny. That's a small cost, but it could add up over time to be significant.

I've also often used code like this:

while(true) { var msg = q1.GetMessage(); if (msg != null) { ... } msg = q2.GetMessage(); if (msg != null) { ... } }

In other words, poll the queues serially instead of parallelly (that should totally be a word). That way you're only actually doing one thing at a time (useful if your tasks are CPU-instensive), but you're still checking all the queues in each loop.

OTHER TIPS

Had the same problem with CPU. It could be caused by non-efficient local implementation of the Azure Queues.

In the end I added exponential sleep policy (for implementation - check out in the Lokad.CQRS for Azure project), where queues are polled frequently, but if there are no messages in either one, we gradually start increasing the sleep interval till it reaches some upper boundary. If the message is discovered - we drop the interval immediately.

This way on the overall the system does not waste the storage transactions (and local dev CPU), but stays extremely responsive, if multiple messages come in a row.

Check out Scaling Down Azure Roles video by Brian Hitney. The basic approach is to spawn some number of threads, each with a "worker" than monitors a given queue and acts appropriately. In particular this keeps one queue from blocking the others....

I think your problem comes from the loop implementation. The polling must be slowed down by something like a sleep(). Otherwise, nothing will prevent the loop to consume 100% CPU Core (which is the normal behavior in fact).

There is a great MSDN article that covers all of this

MSDN - Best Practices for Maximizing Scalability and Cost Effectiveness of Queue-Based Messaging Solutions on Windows Azure

It talks about adding threads and instances when there is work to do - and backing off when there isn't so you're not continuously and needlessly polling queues from mutliple threads and instances, racking up transaction costs and turning a CPU into a heater with constant 100% CPU utilisation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top