Windows Azure VM (Iaas) unexpected restarts [closed]

Question 1

Is it normal to have so many restarts?

Yes this can happen in a given month, you need to stand up SQL Server in high availability mode to really get this to work.

Yes it does cost an arm and leg. ;(

What is your experience with such an environment on Azure? Some months are really good some months are bad, depends on your cluster and which datacenter you are in. MS have mixed range our hardware out in there datacenters. That does not mean they are running on old laptops in some datacenters but it does mean in my experience the new datacenters tend to have better kit in them and thus less restarts. I.e we use USA East.

What can I do to minimise this downtime?

High availability with a witness is the only way to give you availability in VM and yes it cost and arm and leg.

Other serious options. Cache Cache ..You should use computer cache, azure cache and try to minmize your calls to the database. This might reduce your chatty app and allow you to step back in SQL Azure, but might give you enough to for the failover to recover back.

Queues Queues would help you application recover and give you user a message of we are working on it.

Use SQL Azure as failover. Data sync using SQL Azure Sync from Premise (Not sure this works with Express) to SQL Azure and write into you app code to pick up the connection error and failover.

Look at using other parts of Azure for parts of your app to reduce your amount of calls coming into SQL , i.e Can you move stuff to table storage ?

HTHS give you some ideas.

Question 2

Windows Azure Infrastructure Services (IaaS) has only been in General Availability (GA, or production) about 3 weeks, since April 16 (see announcement here). Prior to GA, there was no SLA and you would have seen more frequent OS restarts as various patches were still being applied to the Host OS. Are you saying that this pattern has continued at the same velocity since April 16?

Now that IaaS is GA, I wouldn't expect 4 restarts in a week. That said: there are several reasons you'd see a restart:

Host hardware failure (this takes down all Guest OSs running on that host)
Host software update (and only if requiring a restart of the Host os). Host OS reboots shouldn't be happening at the frequency you're seeing.
Guest OS issues. Here's where things depart from PaaS (web/worker role Cloud Services). In IaaS, there's no Guest OS maintenance done by Azure; this is all in your hands. It's possible to get reboots if installing Windows Updates automatically. Possibly you could be running into an application-level issue causing the box to become unresponsive for a long period of time, resulting in the Azure fabric controller rebooting your box as it thinks it's unhealthy. And... your app could be somehow crashing the box.

If you've ruled out application error and are sure the VMs are in good health at the time they're rebooting, you may need to open a support ticket with Microsoft to help diagnose the issue further.