Question

I'm writing a Worker Role for Windows Azure. The worker receives messages from a queue and for every message it spins up a thread. Threads are calling external API and that involve a lot of thread waiting: i.e. I can call "Create An Instance", API would return 202 Accepted and then I'll have to poll API for "Completed" answer. Sometimes the wait can be 5 minutes. The whole process can take from 10 to 30 minutes, with about 10 calls to the API.

30 minutes in the life of Azure Worker Role is a lot and Restart/Redeploy/Crash can happen during that time. And my process is not idempotent, I can't create the same resource twice without a problem.

What I would like to do is on every critical call to API, store the state of the thread somewhere. So if the thread crashes somewhere, another worker role can pick up a message from the queue and take off from the same point of process, where the previous was interrupted.

One of the ideas was to report thread status and persist that somewhere. Something like this pseudocode:

pubilc class WorkerRole{
    public override Run(){
        while(true)
        {
            var message = Queue.GetMessage();
            var messageProcessor = new MessageProcessor(message);
            var thread = new Thread();
            thread.Run(messageProcessor.Process());
            Thread.Sleep(1 minute);
        }   
    }
}


public class MessageProcessor
{
    private QueueMessage message;
    public MessageProcessor(QueueMessage message){
        this.message = message 
    }

    public void Process()
    {
        if(!ThreadReporter.IsComplete(message, "Step1")
        {
            ExtenalApi.StartStep1();
        }
        ThreadReporter.ReportCompletion(message, "Step1");

        if(!ThreadReporter.IsComplete(message, "Step2"))
        {
            ExternalApi.StartStep2();
        }
        ThreadReporter.ReportCompletion(message, "Step2");
    }
}

ThreadReporter would save a flag somewhere in DB that Step1 is complete or would check if flag for Step1 is already set for that particular message (work request).

I can feel that this approach would have a lot of problems and the code would be horrible. But I struggle to think of a better way to do that.

I've seen Jon Skeet was saving some kind of MemoryDump of an application and after restart taking off from the same place. Can a thread state be serialised to be saved in a DB?

Also I've heard that Workflow Foundation can do that as well. I've never worked with WF and have no clue about it. Any hints about WF?

So, the question is, what is the best way to implement workflow (that is essentially a workflow) that after crash can start from the last point?

Was it helpful?

Solution

This is exactly the kind of thing that Long-Running persisted workflows were designed to solve.

Having a thread run and pause (presumably with loop-thread-sleep's) is not ideal as you said.

Might be time for some re-architecting. Your suggestion of persisting the current state of play after each step to a DB would work alright but if you have the bandwidth, I'd definitely look into Long Running workflows.

http://msdn.microsoft.com/en-us/library/ff432975.aspx

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top