Question

I have two clusters each running different version of Hadoop. I am working on a POC were I need to understand how YARN provides the capability to run multiple applications simultaneously which was not accomplished with Classic Map Reduce Framework.

Hadoop Classic: I have a wordcount.jar file and executed on a single cluster (2 Mappers & 2 Reducers). I started two jobs in parallel, the one lucky started first got both mappers, completed the task and then second job started. This is the expected behavior.

Hadoop Yarn: Same wordcount.jar with a different cluster (4 cores, so total 4 machines). As Yarn does not pre-assign mapper and reducer, any core can be used as mapper or reducer. Here also I submitted two jobs in parallel. Expected Behavior: Both the jobs should start with 2 mappers each or whichever config as resource manager assigns but atleast both the jobs should start.

Reality: One job starts with 3 mappers and 1 reducers. second job waits untill first is completed.

Can someone please help me understand the behavior, as well as does the parallelism behavior best reflected with multinode cluster?

Was it helpful?

Solution

Not sure if this is the exact reason, but the Classic Hadoop and YARN architectures use a different scheduler. Classic Hadoop uses a JobQueueTaskScheduler, while YARN uses CapacityScheduler by default.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top