Question

I have an oozie workflow which runs a Java action. Within the java action, I need to describe a Hive table to get the schema of the table. I do this by using a process builder and executing a shell script containing the describe table query

My describeTable.sh:

hive -e 'describe <tableName>`

Once the java code generates this script, I copy it to /tmp on the local FS and then execute the script using process builder as follows:

fs.copyToLocalFile(bashScriptPath, new Path("/tmp/describeTable.sh"));
ProcessBuilder builder = new ProcessBuilder("bash", "/tmp/describeTable.sh");

The script executes, but it fails to recognize hive as a command

/tmp/describeTable.sh: line 1: hive: command not found

I've also tried /usr/bin/hive -e 'describe <tableName>' , but it doesn't work as well.

The java program works fine when I execute it as a jar file on local FS, but when I run it as part of an oozie workflow, it fails.

I'm not sure how to get this working, I would really appreciate some ideas.

EDIT

Adding the full code for process builder:

fs.copyToLocalFile(bashScriptPath, new Path("/tmp/describeTable.sh"));
    ProcessBuilder builder = new ProcessBuilder("bash", "/tmp/describeTable.sh");
    builder.directory(new File(currentLocalDir));
    ArrayList<String> columnList = new ArrayList<String>();
    System.err.println("trying to run script");
    try {
        final Process process = builder.start();
        InputStream is = process.getInputStream();
        InputStreamReader isr = new InputStreamReader(is);
        BufferedReader br = new BufferedReader(isr);
        BufferedReader error1 = new BufferedReader(new InputStreamReader(process.getErrorStream()));
        String errorLine=null;
        System.err.println("error stream: ");
        while((errorLine=error1.readLine())!=null){
            System.err.println(errorLine);
        }
        String line;
        System.err.println("input stream");
        while((line=br.readLine())!=null) {
            System.err.println("line from running script: " + line);
            String[] output = line.split("\t");
            columnList.add(output[0]);

        }
        is.close();
        isr.close();
        br.close();
        System.err.println("column list:" + columnList);
        return columnList;
Was it helpful?

Solution

Oozie runs most if not all actions as Map Reduce jobs, so the error message you're seeing is probably because the java action is being executed on one of your compute nodes, not the machine from where your submitted your oozie job, or the machine where the Oozie server is running.

You can either ensure that hive is installed on all compute nodes in your cluster, or use the Hive Java API in your Java Action and add the hive libraries (and all dependencies) to your Oozie job's shared library path in HDFS.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top