Question

I am running a workflow using oozie. It has a java action that runs arbitrary(non map reduce) code. This java code is meant to create files and folders on the local file system. I want to make sure that the java action runs over a single node of hadoop's cluster(mine is 7 nodes). It is preferable for me that this java action always runs on the same machine and preferably run on the hadoop master node only. Is it possible? Even if there is a work around please share.

Was it helpful?

Solution

A Java Action is run just like a map reduce job, so you don't have any control over which node in your cluster it will on.

I'm not sure whether oozie will honor this. but you can try and set the number of acceptable mapper failures to a high value (say 10), and then throw an exception in your Java action if the execution cluster node is not the on you want (using InetAddress to acquire the local machine name / IP).

Then hopefully (but not necessarily guaranteed), your action will continue to fail on non-desirable nodes, and eventually succeed on the node of your choice.

Somewhat of a hack, but it might work (again it depends on whether you can amend the number of map task failures before the entire job fails - mapred.map.max.attempts, which is 4 by default)

OTHER TIPS

Any Oozie jobs (simple java action or pig action or hive action or..) in-turn runs as a map-reduce job. It is a core property of Oozie framework.

Oozie workflows are actions arranged in a control dependency DAG (Direct Acyclic Graph).

Ref: Oozie design Architecture

So, I hope we are not able to save the file in the desired node(s) in the cluster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top