You could setup the last Job to use 2 mappers one of which can have the original file as the input. Assuming you would need to reduce both the inputs (input of first job & output of second job) on some common key. MultipleInputs
How can I chain jobs in Hadoop while able to read original unput
Question
I want to chain 3 rounds of MapReduce and at the third one to be able to read the original input as well as the output of the second job. Is this at all possible?
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow