Question

I Am designing a system where I will have programs running in Nominal/Redundant mode, One on one machine, one on another machine. Should the Nominal program fail (Failover event), the Redundant should take over and assume operations as a new Nominal process. This should be transparent to the user.

My Question is: when the Failover occurs, should this be only because of a Hardware failure ? or are Software errors enough of a cause to trigger a Failover ?

More generally, is there an industry standard for deciding what should cause a Failover, or is that up to the system architect / designer ?

Was it helpful?

Solution

From the cluster point of view those kinds of errors do not make any difference. The thing is that you cannot rely on any "I am failing" events from a failing node.

Cluster (in your case "Redundant" role) just finds out that a node didn't send heartbeat (didn't respond to ping). Then "Redundant" makes itself "master" and starts processing incoming requests. That's all, I think.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top