Question

I've read a number of articles / forums on the placing of indexes/shards but have not yet found a solution to my requirement.

Fundamentally, I want to use Logstash (+ Elasticsearch/Kibana) to build a globally distributed cluster, but I want to limit the placement of primary and replica shards to be local to the region they were created in to reduce WAN traffic, but I also want to be able to query all data as a single dataset.

Example

Let's say I have two ES nodes in UK (uknode1/uknode2), and two in US (usnode1/usnode2). If Logstash sends some data to usnode1, I want it to place the replica on usnode2, and not send this across the WAN to the uknode* nodes.

I've tried playing around with index and routing allocation settings, but cannot stop the shards being distributed across all 4 nodes. It's slightly complicated by the fact that index names are dynamically built based on the "type" but that's another challenge for a later date. Even with one index, I can't work this it.

I could split this into two separate clusters but I want to be able to query all nodes as a single dataset (via Kibana) so I don't think that is a valid option at this stage as Kibana can only query one cluster.

Is this even possible to achieve?

The reason I ask if this is possible is what would happen if I write to an index called "myTest" on UK node, and the same index on a US node.....as this is ultimately the same index and I'm not sure how ES would handle this.

So if anyone has any suggestions, or just to say "not possible", that would be very helpful.

Was it helpful?

Solution

It's possible, but not recommended. Elasticsearch needs reliable data connection between nodes in the cluster to function, which is difficult to ensure for geographically distributed cluster. A better solution would be to have two clusters, one in UK and another one in US. If you need to search both of them at the same time you can use tribal node.

OTHER TIPS

Thanks. I looked into this a bit more and have the solution which is indeed using tribal nodes.

For anyone who isn't familiar with them, this is a new feature in ES 1.0.0+

What you do is allocate a new ES node as a tribe node, and configure it to connect to all your other clusters, and when you run a query against it, it queries all clusters and returns a consolidated set of results from all of them.

So in my scenario, I have two distinct clusters, one in each region something this.

US Region

cluster.name: us-region

Two nodes in this region called usnode1 and usnode2

Both nodes are master/data nodes

UK Region

cluster.name: uk-region

Two nodes in this region called uknode1 and uknode2

Both nodes are master/data nodes

The you create another ES node and add some configuration to make it a Tribe node.

Edit elasticsearch.yml with something like this :

node.data: false node.master: false tribe.blocks.write: false tribe.blocks.metadata: false tribe.t1.cluster.name: us-region tribe.t1.discovery.zen.ping.unicast.hosts: ["usnode1","usnode2"] tribe.t2.cluster.name: uk-region tribe.t2.discovery.zen.ping.unicast.hosts: ["uknode1","uknode2"]

You then point Kibana to the tribe node and it worked brilliantly - excellent feature.

Kibana dashboards still save, although I'm not sure how it picks which cluster to save to yet but seems to address my question so a bit more playing and I think it I'll have it sorted.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top