Using Oozie to combine output file parts

Question

Is it possible to use Oozie to concatenate the output of a MapReduce job into a single file? Lets say I have the output ...

part-r-00000
part-r-00001
part-r-00002

and I just want...

output.csv

I know I can pull them down as a single file with hadoop fs -getmerge, but I'm curious if it's possible with a workflow application and HDFS.

Solution

Two simple options i can think of:

Amend the job that produced this output to use a single reducer
Run a map-reduce action with identity mapper, identity reducer and single reducer

OTHER TIPS

You can probably use pig or Java to call

or maybe add it to your own fork of Oozie's fs-action.

You could wrap that curl call in a shell or ssh action.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow