Pergunta

When deploying a web-role to the Windows Azure cloud, what is the default behavior in terms of load balancing? Are there any?

The reason for my question is, that we have the Traffic Manager where you can specify load balancing, fail over and round robin. However, if I do not enable this, how does Azure work behind the curtain then?

The default recommendation in regards to SLA is always at least two instances; but are these two instances serving requests or only one? Hence, default behavior is fail over?

Thanks in advance for any clarifications regarding this matter as I have been unsuccessful finding it on Google.

Foi útil?

Solução

The default should be round robin, but it is not always 100% guaranteed.

One thing that is sure, is that it is not failover load balancing. The idea is that all your instances are equally loaded, but it cannot be 100% guaranteed all the time.

UPDATE

Nothing in this world is 100% guaranteed :) Even the SLA for compute instances is 99.95% and not 100%. Traffic manager has nothing to do with multi-instance deployments. Traffic manager only takes place when you have deployments across geographic regions.

I've been using, exploring, tweaking, developing for-, porting to- Windows Azure since it's first public CTP back in 2008. I can't remember where do I get all the information from, but the compute load balancer shall be using round robin or similar algorythm (and defenitelly not failover) to spread the load across your instances. Even more, it is "stickyless" if I may say so. Which means that there is no guarantee a request from one user will hit the same instance in the next call.

Some resources on Windows Azure (older and newer):

http://www.davidchappell.com/writing/white_papers/introducing_windows_azure_v1-chappell.pdf

http://blogs.msdn.com/b/avkashchauhan/archive/2011/11/12/windows-azure-load-balancer-timeout-details.aspx

Also, something worth mentioning is that, with the latest release there is also SLA for single instance roles: http://www.windowsazure.com/en-us/support/legal/sla/

Additionally, we will monitor all of your individual role instances and guarantee that 99.9% of the time we will detect when a role instance’s process is not running and initiate corrective action.

Outras dicas

@astaykov pretty much covered it. I want to expand on this because of the comment about Traffic Manager and 100% SLA.

I've never heard of a hosting provider offering 100% SLA. That would mean nothing ever goes wrong: Software crash, OS update, OS crash, hardware crash, network interruption, power interruption, DNS interruption... Something at some point will render a server (or VM) unavailable for a period of time.

Windows Azure has a Serivce Level Agreement (SLA) for Cloud Services, Storage, SQL Database, SQL Reporting, Service Bus, Access Control, Caching, and CDN (see all SLA details here). For this question, the Cloud Services SLA is relevant, providing 99.95% availability.

Occasionally, a given role instance will be unavailable. You can pretty much assure yourself of this. There are upgrades to OS images (both for the Guest OS and the Host OS), hardware failures, etc. This issue is not specific to Windows Azure; any cloud or hosting offering will have these types of outages.

To improve uptime availability, multiple instances should be deployed. The instances are then split up across fault domains, meaning they're located on different hardware, different racks, isolated so that if something like a network segment or power connection fails (imagine a server rack's network panel shorting out), only a subset of instances are affected. The load balancer will continue to distribute traffic to the healthy instances (albeit at a reduced capacity until replacement instances come online).

On to Traffic Manager: This is a way to distribute traffic across geographical areas, either for failover or performance. In the former case, you'll have services running in a separate data center, which gives a good "high availability" story for your app (imagine the primary data center going offline for some reason). In the latter case, you can offer better performance to customers when you have a worldwide presence.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top