Question

I'm in the early phases of designing an Azure-based application. One of the things that attracts me to Azure is the scalability, given the variability of the demand I'm likely to expect. As such I'm trying to keep things loosely coupled so I can add instances when I need to.

The recommendations I've seen for architecting an application for Azure include keeping web role logic to a minimum, and having processing done in worker roles, using queues to communicate and some sort of back-end store like SQL Azure or Azure Tables. This seems like a good idea to me as I can scale up either or both parts of the application without any issue. However I'm curious if there are any best practices (or if anyone has any experiences) for when it's best to just have the web role talk directly to the data store vs. sending data by the queue?

I'm thinking of the case where I have a simple insert to do from the web role - while I could set this up as a message, send it on the queue, and have a worker role pick it up and do the insert, it seems like a lot of double-handling. However I also appreciate that it may be the case that this is better in the long run, in case the web role gets overwhelmed or more complex logic ends up being required for the insert.

I realize this might be a case where the answer is "it depends entirely on the situation, check your perf metrics" - but if anyone has any thoughts I'd be very appreciative!

Was it helpful?

Solution

I would say something like an insert doesn't require a worker role. You'd have an insert into the queue anyway, so you wouldn't be saving anything in the web role. Best thing would be to isolate your inserts (and all data access) into a separate class (or classes) within your web role. This would allow you to decouple the rest of the code in your web role from the specific data storage system you're using. That makes changing the data store later much easier. If your inserts end up needing more processing, you can add the queue and worker role when it's needed, but I would still say that you'd want to do the insert into table storage directly and then relegate the computation or other business logic to a worker role. Then that worker role can process messages from the queue or just query table storage for new (unprocessed) records.

The way I see using the queue to communicate with a worker role become most affective is when there are calculations or other processing that would need to be done with the data. The one I've been using the most is actually one of the samples in the Azure SDK that shows how to make thumbnail images. My web role inserts the uploaded image into Azure blob storage and related description and other fields into Azure table storage. It also places a message on the queue that lets the worker role know that there is a new image that needs thumbnails generated. I actually generate a few different sizes of each image for use in different parts of the site. The worker role just generates those thumbnails and doesn't need to send any kind of notification back to the web role. Any place that uses the images has logic to use the original upload or other placeholders when the thumbnails aren't yet available.

This same process could just use a query on the blob storage to find which images still require processing if you wanted to skip the queue altogether. I haven't made a determination on if I prefer using the queue or just polling the data to find records that need the worker role's processing. I suppose the queue is more efficient, but it also adds an extra layer of complexity and an extra potential failure point.

Edit in response to the comment: when I posted this answer I said to just use the full resolution image in the UI if the thumbnail isn't available. Now I am working on a site that just uses a default thumbnail image that says "processing" until the generated thumbnail is available. The choice is yours and really depends on the requirements of your app's UI.

One thing you could do is use SignalR or some bit of AJAX to notify the user's browser when a new thumbnail is available without waiting for a new page load.

Seeing a placeholder thumbnail while image processing is happening on a worker thread is much better user experience than waiting for the page to load while the thumbnail is being generated.

OTHER TIPS

Here's my metaphor, do what you will with it

Imagine you're entering a nightclub, that borders on a dodgy area, but is alright once you're inside.

The management employ some big meaty bouncers on the door to sort out the riff raff. If you're an idiot, you're not getting in. Extend the metaphor as much as you like here.

If you're OK, then they let you in the door, and you join, yes, The Queue to pay at the Box Office to enter the actual club.

Depending if the football's on or something, you might want some more bouncers on the door, but this can be independent of the Box Office staff. Busy night, you might open another window to get the money in quicker, but what you're probably not going to do get the bouncers to handle cash. They've got other things to do with their hands.

So:

  • Bouncers - web roles. Handle the incoming traffic, repel invalid requests and add the valid requests to:
  • The Queue - the Queue!
  • Box Office - worker roles, performing a different role to the webrole

So, there's no reason why your web roles can't do the box office role, but it's probably best not in the long-run

that's my metaphor

Toby

Using Distributed Queues (Azure's or Amazon's or else) is rather surprisingly subtle. I have posted a blog entry covering frequent subtleties of Azure Queues. Bottom line: I suggest to carefully abstract away your infrastructure logic (supporting the queue) from your business logic (content and processing of the queue).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top