Two simple options i can think of:
- Amend the job that produced this output to use a single reducer
- Run a map-reduce action with identity mapper, identity reducer and single reducer
Question
Is it possible to use Oozie to concatenate the output of a MapReduce job into a single file? Lets say I have the output ...
part-r-00000
part-r-00001
part-r-00002
and I just want...
output.csv
I know I can pull them down as a single file with hadoop fs -getmerge
, but I'm curious if it's possible with a workflow application and HDFS.
Solution
Two simple options i can think of:
OTHER TIPS
You can probably use pig or Java to call
or maybe add it to your own fork of Oozie's fs-action.
Alternatively, using webhdfs: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files .
You could wrap that curl call in a shell or ssh action.