Question

I am loading data into a two node elasticsearch cluster, 5 shards each, using apache-flume (one flume agent) using ExecSource (cat command), file channel and a sink that is custom built using elasticsearch Bulk and XContentBuilder java apis.

The collection of events in flume happens at a rate of 8000 events/sec (size of 1 event 246 bytes) but during indexing in elasticsearch, it gets reduced to 3000 events/sec.

How can I tune indexing performance of elasticsearch to get my throughput close to the rate of collection in flume?

Was it helpful?

Solution

I have written a script which you can download here. Its a shell script however I'm sure you can translate irrespective of your platform. There are many variable in indexing performance. Hardware and system variables. There are quite a lot of resources out there.

OTHER TIPS

I would consider looking at the way logstash writes to elasticsearch - specifically they set the indexing interval to 5s in order to speed up performance. You may also want to test whether compression helps or hinders.

Otherwise, I would increase your cluster size.

use memory channel instead of file channel,it will improve several times output speed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top