Using Cloudfront to expose ElasticSearch REST API in read only (GET/HEAD)

Question 1

I ended coding with my own plugin. Surprisingly there was nothing quite like this around. No proxies, no Jetty, no Tomcat.

Just a the original ES rest module and my RestFilter. Using a minimum of reflection to obtain the remote address of the requests.

enjoy:

https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin

Question 2

Note that even a GET request can be harmful in Elasticsearch. A query which simply takes up too much resources to compute will bring down your cluster. Facets are a good way to do this.

I'd recommend writing a simple REST API you place in front of ES so you get much more control over what hits your search cluster. If that's not an option you could consider running Nginx on your ES boxes to act as a local reverse proxy, which will give you the same control (and a whole lot more) as CloudFront does. Then you'd only have to open up Nginx to the world, instead of ES.

Question 3

A way to do this in AWS would be:

Set up an Application Load Balancer in front of your ES cluster. Create a TLS cert for the ALB and serve https. Open the ES security group to the ALB.
Set up CloudFront and use the ALB as origin. Pass a custom header with a secret value (for WAF, see next point).
Set up WAF on your ALB to only allow requests that contain the custom header with the secret value. Now all requests have to go through CloudFront.
Set up a Lambda@Edge function on your CloudFront distribution to either remove the body from GET requests, or DENY such requests.

It’s quite some work, but there’s advantages over the plugin, e.g.:

CloudFront comes with free network DDOS protection
CloudFront gives your users lower latency to ES because of the fast CloudFront network and global PoP’s.
Opens many options to use CloudFront, WAF and Lamba@Edge to further protect your ES cluster.

I’m working on sample code in CDK to set all of this up. Will report back when that’s ready.