Question

With master-slave implementation of distributed Solr (prior to Solr 4.x) it was a straight design solution to have master which takes load for indexing, merging and optimizing index. Then the index gets copied to replicas while replicas meanwhile are always serving searches.

Could someone explain how this is done now with SolrCloud? Seems like SolrCloud sends indexing commands to each replica from leader. But how the search performance could be achieved then? Indexing and searching on each replica makes load on each node server (to index and run merge thread in background) and since my index is quite big it takes a lot of time usually to merge segments or simply optimize. Should I deliver that all now to merge policy and not worry at all? Does TieredMergePolicy provide both good search performance and low resource load (CPU, I/O) at the same time?

Was it helpful?

Solution

I'll try to answer part of your questions: SolrCloud indeed indexes on all nodes, and therefore it has a performance impact on replicas. This is done due to 'hot replication' model instead of 'cold replication' as you are used to. It comes to solve data integrity issues as well as real time search on a cluster. You get consistent data and faster data availability as a price of performance impact. Actually, you can always split data to shards (at a price of additional hardware), and have comparable performance. In either case, it's up to you to decide whether SolrCloud suits your needs. You can use Solr 4 without cloud model and manage it yourself as before.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top