Question

I have setup the SolrCloud a while age, but it was never quite working the way I thought it should. My concern is about high availability. Maybe I didn't understand something, but it seemed to me that when some node goes down in SolrCloud, the other nodes should still continue to work, right?

My system has only 2 nodes. The number of shards is 1, i.e. both nodes are kind of like "mirrors" or copies of each other. My intent was to achieve a system where if one node goes down, the other one keeps operating.

Ever since I've installed the cloud I have noticed that if either one of 2 nodes is shut down, I can't access the other node's web UI no matter what until the second node comes up as well.

Why is this happening?

btw, the SolrCloud is version 4.4.0 1504776 - sarowe, and Zookeeper is also spread across the 2 nodes. Is this because of Zookeeper clustered across 2 machines?

Was it helpful?

Solution

I believe that Zookeeper is the problem. A Zookeeper ensemble must be of 2N+1 instances and tolerate N down nodes. If you have only 2 than you are in the configuration of 2*1+1=3 nodes in the ensemble. Only one down node is tolerated and you need at least 2 nodes up. As you have only 2 zk, if any of them is down than your ZK ensemble is also down.

To achieve high availability, it is recommended to deploy an independent Zookeeper ensemble with at least 3 instances on 3 differents machines to eliminate SPoF.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top