Question

I am running a pig script trough Oozie. The script uses a UDF.

The UDF gets its parameters like this:

public Float exec(Tuple input) throws IOException {

    if (input == null || input.size() == 0)
        return new Float(0);

    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());

    String firstModel = input.get(1).toString();

    InputStream firstModel = fs.open(new Path(firstModel));
    ...

In the Oozie debug, the ingoing parameter seems to be ok:

  -param
  firstModel_firstscript=./en-sent.bin

in the script itself it looks like this:

%DEFAULT firstModel_firstscript 'somedefaultstuffthatisntused/firstmodel.bin';
...
myUDF(document, '$firstModel_firstscript', '$secondmodel_firstscript', '$lastmodel_firstscript') AS score;

The same results go for

myUDF(document, '${firstModel_firstscript}', '${secondmodel_firstscript}', '${lastmodel_firstscript}') AS score;

in STDERR it reads:

ERROR 2078: Caught error from UDF: my.domain.udf.myUDF [File does not exist: /user/cloudera/firstmodel_firstscript

note that it isn't the directory that I should have passed.

I'm at a loss here.... Hope I explained my situation clear enough.

Regards

Was it helpful?

Solution 2

I found that I was passing hadoop settings in my script the wrong way.

Using:

set xyz firstmodel_firstscript;

instead of

set xyz $firstmodel_firstscript;

even tough the values were already set via %default, this is still the way to do it.

OTHER TIPS

I think parameters are case sensitive. You pass firstModel_firstscript, but you use firstmodel_firstscript in the pig script. Hope that helps.

Also, please try to access your variables in pig as follows:

${firstmodel_firstscript}

https://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#a3.2.3_Pig_Action

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top