Question

I had two worker role instances out of six total in a cloud service stop by themselves, apparently for no reason. In my logs I discovered that a third instance went down one hour earlier (to the minute), for 25 minutes, and then started again. What could have caused this to happen?

The cloud service is four different worker role projects, two of them has two instances each. They all work with Service Bus and either Storage Table, SQL Database or both. The instances that stopped were all from different projects. They have worked fine before and this particular version has been running until now for six days without any problems. I checked Windows Azure Service Dashboard and everything looks fine there. They were not in recycling mode like when there's an unhandled error.

I have uploaded a new version (I was going to anyway, almost the same) and put two instances for each role. Nothing has stopped so far.

In the management portal I don't see a way that someone could do that without stopping the whole service (all instances). Does anyone have an idea to how this could happen?

Was it helpful?

Solution

When you say that they stopped, what exactly do you mean? Was the instance gone? Was it in the Busy state? What was the exact message in the portal? How did you recover them?

I suspect when you say stopped you mean the VMs were rebooted unexpectedly and then came back online shortly after that. If this is the scenario, check out http://blogs.msdn.com/b/kwill/archive/2012/09/19/role-instance-restarts-due-to-os-upgrades.aspx for information about OS updates in Azure and what you would see in your roles.

OTHER TIPS

Have you enabled the Auto Scaling options for your deployment in the Azure Management Portal... are you sure it wasn't just Azure spinning down idle instances to save you on resource consumption ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top