Using elasticsearch-river-mysql to stream data from MySQL database to Elasticsearch

Question

My advice already is to try to use the elasticsearch-jdbc-river for many reasons.

One of them is that the elasticsearch-jbdc-river is more generic in case you decide to switch RDBMS.

Another is that the jbdc-river is still maintained when the other one hasn't been since 2 years, and Elasticsearch evolved a lot ever since.

1. From what I know, the data will be streamed from the MySQL database to the ES cluster which will index it automatically. Is that correct? Are there any timeouts or limits I have to be aware of?

The data from MySQL should be streamed automatically from MySQL to the Elasticsearch cluster without a timeout limitation but the bottleneck will be your JVM Heap Size. I'm not sure how much do you need to process the amount of data you have. You need to test it.

2. How the foreign key relations between the relational database tables will be translated into ES? Will the table row containing the foreign key become an inner object for an ES document or some other relation between the ES documents will be used?

Elasticsearch is schemaless so you need to manage to the inside Elasticsearch. The river just streams the data into your cluster. You can define your mapping when you create your index and then use the river to stream it into the ES cluster.

3. Are there any disadvantages in using this river for the mentioned above purpose?

The river will be replaced with another cleaner way to stream these data but this is the best solution you have for now.