Pregunta

I am loading data into a two node elasticsearch cluster, 5 shards each, using apache-flume (one flume agent) using ExecSource (cat command), file channel and a sink that is custom built using elasticsearch Bulk and XContentBuilder java apis.

The collection of events in flume happens at a rate of 8000 events/sec (size of 1 event 246 bytes) but during indexing in elasticsearch, it gets reduced to 3000 events/sec.

How can I tune indexing performance of elasticsearch to get my throughput close to the rate of collection in flume?

¿Fue útil?

Solución

I have written a script which you can download here. Its a shell script however I'm sure you can translate irrespective of your platform. There are many variable in indexing performance. Hardware and system variables. There are quite a lot of resources out there.

Otros consejos

I would consider looking at the way logstash writes to elasticsearch - specifically they set the indexing interval to 5s in order to speed up performance. You may also want to test whether compression helps or hinders.

Otherwise, I would increase your cluster size.

use memory channel instead of file channel,it will improve several times output speed.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top