Question

I have the following DSE cluster setup:

DC Cassandra

  • Cassandra node 1

DC Solr

  • Solr node 1
  • Solr node 2
  • Solr node 3
  • Solr node 4

The replication factor is 1 for each DC

My questions:

  1. To perform a search, I send a Solr SELECT query to a specific node. This introduces a single point of failure. If the node is down, the query fails. Is there a way to "query the cluster/DC" instead of querying a specific node?
  2. In order for the result-set to be complete, I need to manually specify the other nodes via the 'shards' parameter. Is this by the expected behavior or have I misconfigured something? My expectation is that this should have been automatic. I don't want to have to edit my app's source code every time I add a node to the cluster
  3. Following up from question #1 and 2, if any other node (aside from the specific node where I send the Solr query) is down, most of the time, I get an error like 'Unavailable shards for ranges..." or "Server connection refused at...". Again, this breaks fault tolerance. Is it possible to make the cluster return partial results? (i.e. only data from the available nodes)

Overall, my goals are:

  1. Make the app as fault-tolerant as possible - if any of the nodes are down, the app should still display partial results from the remaining nodes
  2. Make the underlying DSE topology transparent to the app. I should not need to edit the app's source code / config every time a node is added or removed
Was it helpful?

Solution

About your specific questions:

1) Falling back to another server in case the requested one is unavailable is something akin to client load balancing, that is, usually implemented by the client side: we rely on standard Cassandra and Solr clients, so you have to build on them.

2) No, you must not use the "shards" parameter: just send your query to any of the DSE Solr nodes, and it will be transparently distributed.

3) The "Unavailable shards" error happens because the distributed search query needs to contact all token ranges to provide a correct answer. The usual solution is to increase the replication factor in order to be able to tolerate RF-1 failures; we don't currently support partial results, but we may do in future versions.

Overall, DSE Solr is completely transparent and highly available, provided you setup a proper replication factor to accommodate the number of failures you want to tolerate.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top