Question

I have read the documentation so I know the difference.

My question however is that, is there any risk in using .submit instead of .waitForComplete if I want to run several Hadoop jobs on a cluster in parallel ?

I mostly use Elastic Map Reduce.

When I tried doing so, I noticed that only the first job being executed.

Was it helpful?

Solution

If your aim is to run jobs in parallel then there is certainly no risk in using job.submit(). The main reason job.waitForCompletion exists is that it's method call returns only when the job gets finished, and it returns with it's success or failure status which can be used to determine that further steps are to be run or not.

Now, getting back at you seeing only the first job being executed, this is because by default Hadoop schedules the jobs in FIFO order. You certainly can change this behaviour. Read more here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top