Question

http://testing:50070/webhdfs/v1/Test/asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg?op=OPEN

I am fetching the above image through hadoop using webhdfs i want to cache this image in browser how to do that is there any kind of mechanism to cache images coming from hadoop and how to hide the port number in this url

Was it helpful?

Solution

I'm not familiar with webhdfs but if it does not support caching, you have to put a caching layer between client and webhdfs server.

Well, the thing you need is called reverse proxy with enabled caching capability. There is several options how to do it, but you should go with Apache mod_cache or Nginx reverse proxy caching and you will be just fine.

So, if you want to hide port from url you have to start webserver/proxy on port 80. Then just make proxy alias on /proxy context and set request redirect to http://testing:50070/webhdfs. Enable caching. And finally you can request your webhdfs via caching proxy on url http://testing/proxy/v1/Test/asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg?op=OPEN

The communication will look like:

Client 1:00PM <> Proxy (no cache) <> Webhdfs (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg)
Client 2:00PM <> Proxy (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg) expires in 1h
Client 2:45PM <> Proxy (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg) expires in 15min
Client 4:00PM <> Proxy (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg) expired!! <> Webhdfs (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg)

I didn't provide any examples, but you can find many for Apache or Nginx. You choose.

OTHER TIPS

I know this is a late response but Apache Knox is the REST API gateway for Hadoop access. One of its specific goals is to hide the internal topology information from consumers. Apache Knox

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top