Question

I have a hadoop map-reduce job running as a step in Oozie workflow. It is started using java action which implements org.apache.hadoop.util.Tool.

When the job is being killed for some reason I want to be able to email a notification which should contain the stacktrace if there was an exception during processing.

Currently I do it this way:

<action name="sendErrorNotifications">
    <email xmlns="uri:oozie:email-action:0.1">
        <to>some-dl@company.com</to>
        <subject>Job execution failed ${wf:id()}</subject>
        <body>Job execution failed, error message: [${wf:errorMessage(wf:lastErrorNode())}]</body>
    </email>
    <ok to="fail" />
    <error to="fail" />
</action>

But all I receive is just:

Job execution failed, error message: [Job failed!]

Which is not very useful :) and I need to go and check all the nodes' logs by myself.

How can I get more specific messages? Should I catch my exceptions and wrap into some oozie-catchable one in the Tool, or just use something instead of ${wf:errorMessage...

Thanks

Was it helpful?

Solution 2

I found a way to handle errors and access the cause by using Counters. Maybe it is not what they are designed for, but it seems to be the only way out...

So I catch every Throwable in mapper and reducer like this:

} catch (Throwable t) {
    Counters.Counter counter = reporter.getCounter("Exceptions", t.getClass().getSimpleName());
        counter.increment(1);
    counter.setDisplayName(t.getClass().getSimpleName() + "\n last failed key: " + key.toString() + "\n " + ExceptionUtils.getStackTrace(t));
    reporter.incrCounter("Exceptions", "TOTAL_COUNT", 1);
    reporter.progress();
}

And these counters are easily accessible in the Tool via RunningJob after job is finished. "Exceptions" group contains all exceptions' counters with all needed information in displayName field.

Please comment if you see any problems in this approach or if you know the better one.

OTHER TIPS

One suggestion is to catch the exception in your main method, and export a property ('exceptionTrace' for example) with the exception serialized into its value (combined with the capture-output flag), which you can then reference using the wf:actionData('myJavaAction')['exceptionTrace'] EL function.

http://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#a3.2.7_Java_Action

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top