Вопрос

Im using Solr Version 4 (api spring data solr to index,get...documents) and i have to decide which strategy im going to apply for index my documents.

I hesitate between 2 strategies:

  1. Launch a batch periodically to index all documents

  2. Only Index the document when this one has changed

Which strategy is the best ? maybe a mix??or another.. I have some ideas about cons and dis of each but i don't have a big experience with solr.

Это было полезно?

Решение

Depends on how long indexing all your documents takes and how soon you want your index to be updated.

We have several Solr cores - some have less than 100K very small docs and a full import via data import handler (with optimize=true) runs under 1 minute. We can tolerate delays of up to 15 minutes for them, so we run a full import for this core every 15 min.

Then there are cores at the other extreme with several million docs, each of fairly large size, and full indexing will take several hours to complete. For such cores, we have a changelog table in MySQL which only records the docs that changed and we do an incremental indexing only for those docs every few min.

Finally, there are cores that are in the middle, having about 500K docs of decent size, but on these we need atomic updates every 5 to 10 min for certain fields and full document update for certain docs every few min as well. We run delta imports for these. Full index itself takes about 1.5 to 2 hours to run, which we do nightly.

So the answer to your question really depends on what your requirements are.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top