Question

We have a Hadoop service in which we have multiple applications. We need to process the data for each of the applications by reexecuting the same workflow. These are scheduled to execute at the same time of the day. The issue is that when these jobs are running its hard to know for which application the job is running/failed/succeeded. Ofcourse, I can open the job coonfiguration and know it but that does take time since there are 10s of applications running under that service.

Is there any option in oozie to dynamically pass the name of the workflow (or part of it) when executing the job such as

oozie job -run -config <filename> -name "<NameIWishToGive>"
OR
oozie job -run -config <filename> -nameSuffix "<MyApplicationNameUnderTheService>"

Also, we dont wish to create multiple job folders to execute separately as that would be too much of copy paste.

Please suggest.

Was it helpful?

Solution 2

you will find a whole bunch of oozie command lines here in the apache docs. i'm not sure which one exactly you are looking for so i thought i'd just paste the link. hope this helps!

OTHER TIPS

It looks to me like you should be able to just use properties set in the job config.

I was able to get a dynamic name by doing the following.

Here's an example of my workflow.xml:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf-${environment}">
...
</workflow-app>

And in my job.properties I had:

...
environment=test
...

The name ended up being: "map-reduce-wf-test"

I couldn't find anything in oozie to do that. Here is the script that does find/replace of #{appName} and #{frequency} in *.xml files + uploads all files to hdfs. Values are taken from the properties file passed to the script as the 3rd argument.

Gist - https://gist.github.com/epishkin/5952522

Example:

./upload.sh simple_reports namenode01 simple_reports/coordinator_script-1.properties

where 'simple_reports' is a folder with workflow.xml and coordinator.xml files.

workflow.xml:

<workflow-app name="#{appName}" xmlns="uri:oozie:workflow:0.3">
...
</workflow-app>

coordinator.xml:

<coordinator-app name="#{appName}-coord" xmlns="uri:oozie:coordinator:0.2"

             frequency="#{frequency}"
             start="${start}"
             end=  "${end}"

             timezone="America/New_York">
...
</coordinator-app>

coordinator_script-1.properties:

appName=multi_network
frequency=${coord:days(7)}
...

Hope this helps.

I had recently faced this issue and this, All the tables uses the same workflow but name of the oozie application should reflect the name of the table it is processing.

Then pass the same parameter from job.properties then the name of the ozzie application will be acoording to dataload_tablename.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top