Pergunta

we are working on a JAVA EE project which handles huge amount of data, but has to provide full-text-search option (in hungarian language). So we started to think about what kind of architecture could fulfill our requirements. My thoughts are the following:

Using ElasticSearch as a database is an antipattern so it must be used just for indexing and searching

MongoDB is fit for our expectations so it seems to be a good choice as database.

The problem is, how to index MongoDB data with ElasticSearch? I created a POC with 13 million documents. I iterated through the documents and in each iteration I saved them into MongoDB (it gave me an ID for each document) then I put the documents into ElasticSearch but stored only the Mongo ID. Document indexing was quite fast, average 4,8 ms per document.

When I search with Elastic, it gaves me back the matching document ID's and I can load the documents from Mongo with the $in operator. This also seemed quite fast.

All that means that it can be a good approach but is it really? I can't figure out when does this architecture slows down or what could be a bottleneck. Maybe syncronizing ElasticSearch with Mongo but it can be run on a distributed environment (Hadoop).

So my question: is there a better way to synchronize MongoDB with ElasticSearch?

Foi útil?

Solução 2

Update: actually we solved it in a custom way and so far it seems ok. Responding to additional comments, the kind of data is not public, but the size of the documents are in a wide scale and the data amount is constantly growing - no data will be archived (business requirement).

Our Solution: we keep MongoDB and ES synchronized manually by controlling the data insertion carefully and we use the same id. The biggest disadvantage of this is that we have to handle the failures very carefully and have to track insertion in each stage (e.g what if document is already in Mongo but failed to index by ES).

Outras dicas

I had the same request, and found these references that could help you.

Java + MongoDB + Elastic Search = River Plugin you can find at https://github.com/richardwilly98/elasticsearch-river-mongodb/wiki

And if you are really going to have a gorgeous amount of data to manage, so please read this interesting experience and the conclusion of the Quark'sLab : http://blog.quarkslab.com/mongodb-vs-elasticsearch-the-quest-of-the-holy-performances.html

Hope it helps.

Licenciado em: CC-BY-SA com atribuição
scroll top