Question

I have a ruby script that I want to use with Hive streaming. This script requires the use of an external gem. Because this gem is not installed on my data nodes, the script will not run.

I would prefer to be able to add this gem on a temporary basis just to run this job. Is there a way to include this gem to the distributed cache? Maybe as a zip? (e.g. ADD FILE custom_gem.zip)

Was it helpful?

Solution

The best way I have found to do this is to manually add the files of the gem to the distributed cache.

Here is an example of using the browser Ruby gem:

I download and unzip browser-master.zip from GitHub. Then I add the entire unzipped folder to the distributed cache:

ADD FILE /home/user/browser-master

In the Ruby script that I am using in Hive, I have to tell Ruby where to find the needed files from the gem:

$.push File.expand_path("../browser-master/lib", __FILE__)
require "browser"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top