Pregunta

I have a coordinator job in Oozie. It calls the workflow with a java action node.

If I submit this job only once, then it works perfectly. However, if I submit this job twice with the same start and end time, but a different arg1 to the Main class, then both the job instances hang in the "RUNNING" state and the logs look like this:

>>> Invoking Main class now >>>

Heart beat
Heart beat
Heart beat
Heart beat
...

If I kill one of the jobs, then the other one starts running again.

The documentation states that it is possible to submit multiple instances of the same coordinator job with different parameters: http://archive.cloudera.com/cdh/3/oozie/CoordinatorFunctionalSpec.html#a6.3._Synchronous_Coordinator_Application_Definition

"concurrency: The maximum number of actions for this job that can be running at the same time. This value allows to materialize and submit multiple instances of the coordinator app, and allows operations to catchup on delayed processing. The default value is 1 ."

So what am I doing wrong? I even saw two instances of the workflow action from the same job being in the "RUNNING" state which ran fine once the other job was killed.

¿Fue útil?

Solución

Ok I found the issue. It was related to HBase concurrency and not enough task slots in the cluster. Setting the following property in the mapred-site.xml file fixes the issue:

<name>mapred.tasktracker.map.tasks.maximum</name>
<value>50 </value> 

It was similar to this issue : https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/v0BHtQ0hlBg

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top