I'm not sure if this will answer your question, but in looking at the source code for ClusterSingletonManager
, you can see the chain of events that leads to this scenario. This class uses the Finite State Machine logic in Akka, and the behavior you are seeing is kicked off due to a state transition from Start -> BecomingLeader
. First, look at the Start
state:
when(Start) {
case Event(StartLeaderChangedBuffer, _) ⇒
leaderChangedBuffer = context.actorOf(Props[LeaderChangedBuffer].withDispatcher(context.props.dispatcher))
getNextLeaderChanged()
stay
case Event(InitialLeaderState(leaderOption, memberCount), _) ⇒
leaderChangedReceived = true
if (leaderOption == selfAddressOption && memberCount == 1)
// alone, leader immediately
gotoLeader(None)
else if (leaderOption == selfAddressOption)
goto(BecomingLeader) using BecomingLeaderData(None)
else
goto(NonLeader) using NonLeaderData(leaderOption)
}
The part to look at here is:
else if (leaderOption == selfAddressOption)
goto(BecomingLeader) using BecomingLeaderData(None)
To me, it looks like this piece is saying "If I'm the leader, change start to Become Leader with None as the previousLeader option"
Then, if you look at the BecomingLeader
state:
when(BecomingLeader) {
...
case Event(HandOverRetry(count), BecomingLeaderData(previousLeaderOption)) ⇒
if (count <= maxHandOverRetries) {
logInfo("Retry [{}], sending HandOverToMe to [{}]", count, previousLeaderOption)
previousLeaderOption foreach { peer(_) ! HandOverToMe }
setTimer(HandOverRetryTimer, HandOverRetry(count + 1), retryInterval, repeat = false)
} else if (previousLeaderOption forall removed.contains) {
// can't send HandOverToMe, previousLeader unknown for new node (or restart)
// previous leader might be down or removed, so no TakeOverFromMe message is received
logInfo("Timeout in BecomingLeader. Previous leader unknown, removed and no TakeOver request.")
gotoLeader(None)
} else
throw new ClusterSingletonManagerIsStuck(
s"Becoming singleton leader was stuck because previous leader [${previousLeaderOption}] is unresponsive")
}
This is the block that keeps repeating that message you are seeing in the log. It basically looks like it's attempting to get a previous leader to hand over responsibility to w/o knowing who the previous leader was because in the state transition, it passed in None
as the previous leader. The million dollar question is "If it doesn't know who the previous leader is, why keep attempting handoffs that will never succeed?".