Let us assume there are 4 nodes A, B, C and D in your distributed system. Let us assume the current leader is A. An election occurs only if any one of the nodes B, C or D identifies that the coordinator A is not responding. The failure of the leader A is understood because of message timeouts or failure of the coordinator to initiate a handshake. Unlike your algorithm in the standard bully algorithm the elections are performed only in case of coordinator failure or when a new node with a higher process id is introduced.
Bully Algorithm - Detecting Failure
-
13-04-2022 - |
Domanda
Descriptions of the bully algorithm usually do not cover the actual detection of a failure.
I have a working implementation of the bully algorithm that uses the elections themselves to detect failures, rather than have failures trigger elections.
In short, elections in my implementation are performed on a scheduled basis, rather than upon a failure detection.
Clearly this means network traffic is generated, but it seems like a simple solution to something that otherwise might become complicated (e.g. having a separate failure detection mechanism, which will have its own network traffic).
Can anyone see a problem with this?
Soluzione
Altri suggerimenti
Usually, the leader election is started when a member suspect that there is no leader anymore, i.e. after a (local) timeout. Frequently, a local timeout is not sufficient, but in addition an expected action of the leader.
Appling this scheme, there is no need for a periodic re-election nor for a special failure detection.