Can one make Apache Solr index transactionally consistent with DB being indexed?

https://stackoverflow.com/questions/12966489

09-07-2021
|

题

I am new to Solr. I am trying to make a server that stores structured data in a database, and which can be searched using Solr/Lucene. The server can be is clustered into any number of identical nodes for high availability.

It seems that the standard configuration Solr stores the index in a file on the file system. This seems to introduce some problems with consistency and clustering.

How do I make the index transactionally consistent with the DB? Is there a way to do this? (e.g. some way to make commits to the DB coordinated with commits to the Solr index?)

Is there any way to store the index in the (relational) DB? This would solve the consistency problems and cluster problems, but I don't find a lot of literature on how to do this.

When configured as a cluster, does each cluster node need to maintain it's own copy of the index. It is not clear whether multiple instances of Solr can update a single index or not.

Or -- do we give up accept that the index is not guaranteed to be consistent, rebuild it every day or so? What do people normally do about this?

解决方案

Q> How do I make the index transactionally consistent with the DB?
A> You can't. You can probably invent another transaction layer on top, but it will take ages to develop and you won't reach 100% consistency anyway. You could, for example, send data both to the DB and Solr and only commit after both data arrives but this will not be atomic.

Q> Is there any way to store the index in the (relational) DB?
A> With Lucene 4.0, you probably can (by writing your own codec). But this won't solve your problem.

Q> When configured as a cluster, does each cluster node need to maintain it's own copy of the index?
A> Yes.

Q> It is not clear whether multiple instances of Solr can update a single index or not.
A> Multiple Lucene/Solr instances can't write to the same index file(s). Max you can do is to create multiple IndexSearchers. But this is probably done at Solr level anyway.

Q> do we give up accept that the index is not guaranteed to be consistent?
A> Yes. I think you are too db-centric. Think about Solr/Lucene as you think about Google - I bet they don't roll out their entire index atomically throughout the world. If search results will have minor inconsistencies depending which server you hit (for a few seconds of course), it's not a big deal.

Q> rebuild it every day or so? What do people normally do about this?
A> Lucene has near-real time search but at the basic level you just send index updates and commit as db changes happen, then reopen the index reader to see these updates. This is all done automagically in Solr.

其他提示

In know this is a bit old but it might help someone. You can try solrcloud with Apache zookeeper.

Apache Solr out of the box includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability- Called SolrCloud, these capabilities provide distributed indexing and search capabilities, supporting the following features with little config:

Central configuration for the entire cluster
Automatic load balancing and fail-over for queries
ZooKeeper integration for cluster coordination and configuration.

Zookeeper is a cluster manager for solr. It works really well with solr.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

http://zookeeper.apache.org/doc/trunk/zookeeperOver.html

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow