Question

I don't understand what are the uses for the output-events in Ozzie. the Ozzie docs states that "A coordinator action can produce one or more dataset(s) instances as output", but it doesn't give any practical details or examples. what does it mean to produce a dataset instance as output? does it mean that Ozzie will create as an output a folder by the dataset's URI template? I dont really understand why should I use output evets...

Thanks!

Was it helpful?

Solution

If you are talking about Oozie, the output files are used to connecting different coordinator jobs. Consider a big DAG of coordinator jobs, some job might take other jobs' output as its input. So the datasets are the edges in the DAG.

For example, in the Oozie configuration file, if you specify Coordinator A's output is DS1, Coordinator B's output is DS2, and Coordinator C's input is DS1, and DS2, then Oozie will guarantee you that the corresponding action in Coordinator C will not be executed before DS1 and DS2 are ready.

OTHER TIPS

There is at least one use of specifying <output-event>s in your coordinator. When re-running a coordinator for a range of dates (using oozie job -rerun command), all the corresponding paths specified as <output-event>s will be deleted.

Sometimes it is useful to remove all the outputs generated by a coordinator's instances. For example when you want to start another coordinator that has those paths as <input-event>s and you want to make sure it will process the re-run data instead of the old data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top