Question

I'm testing Solr as my full text search engine provider over 1,000,000 documents. I have also users information data which is related to the documents as creator and I want to store the users hit.

Is it necessary to have database engine to store all the data? Or Solr is stable and safe to rely on? Is there any risk to loose the stored data in Solr (I know it can happen to Solr index and I can rebuild it, but how about RAW data?)

The only reason that I want to have 2nd storage is having another backup/version of all of my data (not for querying,...).

Was it helpful?

Solution

Amir,

  1. Solr is stable. If you are not convinced, have a look at list of users here... http://wiki.apache.org/solr/PublicServers which include NASA, AT&T etc...

  2. Solr main goal is to serve as Search engine, helping us to implement search, NLP algorithms, Big Data issues, etc. Solr is not meant to be main data store (also it might serve as one....

Reason for the ambiguous sentence above is that unlike relational database, Solr can store both original data and index OR the INDEX ONLY without the data itself.

If you store only the index, by specifying in Solr schema.xml Stored="false" per field, then you get a much smaller Solr data volume and better performance, but when you query Solr you will receive back only the document ID, and you will have to continue with your relational DB.... Of course you can store some of the data, some of document field, and avoid storing some.

Of course, you should backup/ replicate Solr to ensure disaster recovery, etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top