Question

I want to use elasticsearch-river-mysql in order to continuously transfer data from MySQL database to ElasticSearch. I'm beginner with ES and rivers so I hope you can help me out with my questions.

  1. From what I know, the data will be streamed from the MySQL database to the ES cluster which will index it automatically. Is that correct? Are there any timeouts or limits I have to be aware of?
  2. How the foreign key relations between the relational database tables will be translated into ES? Will the table row containing the foreign key become an inner object for an ES document or some other relation between the ES documents will be used?
  3. Are there any disadvantages in using this river for the mentioned above purpose?
  4. What will be the load on ES by implementing this? I assume that ES is powerful enough, but still I was wondering. Will the searches made on the ES cluster be affected anyhow in this scenario?
Was it helpful?

Solution

My advice already is to try to use the elasticsearch-jdbc-river for many reasons.

One of them is that the elasticsearch-jbdc-river is more generic in case you decide to switch RDBMS.

Another is that the jbdc-river is still maintained when the other one hasn't been since 2 years, and Elasticsearch evolved a lot ever since.

1. From what I know, the data will be streamed from the MySQL database to the ES cluster which will index it automatically. Is that correct? Are there any timeouts or limits I have to be aware of?

The data from MySQL should be streamed automatically from MySQL to the Elasticsearch cluster without a timeout limitation but the bottleneck will be your JVM Heap Size. I'm not sure how much do you need to process the amount of data you have. You need to test it.

2. How the foreign key relations between the relational database tables will be translated into ES? Will the table row containing the foreign key become an inner object for an ES document or some other relation between the ES documents will be used?

Elasticsearch is schemaless so you need to manage to the inside Elasticsearch. The river just streams the data into your cluster. You can define your mapping when you create your index and then use the river to stream it into the ES cluster.

3. Are there any disadvantages in using this river for the mentioned above purpose?

The river will be replaced with another cleaner way to stream these data but this is the best solution you have for now.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top