Question

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:

Goal:

  • 5 machines each running elasticsearch united into a single cluster.
  • 5 machines each running logstash server and streaming data into elasticsearch cluster.
  • N machines under monitoring each running lumberjack and streaming data into logstash servers.

Constraint:

  • It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting discovery does not work.

Solution:

  • Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
  • I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
  • With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.

Question:

  • Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
  • Is there an easier (natural) solution for this problem?

Thanks!

Was it helpful?

Solution

You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.

node.data: false
node.master: true

You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.

Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.

Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top