Question

I have 1-node Hadoop test setup with MapReduce job which starts 96 mappers and 6 reducers. Before migration to YARN this job performed steady but normal. With YARN it started to hang 100% with most of mappers in 'pending' state.

Job is actually 6 sub-jobs (16 mappers + 1 reducer each). This configuration reflects production process sequence. All of them are under single JobControl. Is there any configuration I need to check or best practice for such cases with small amount of nodes and relatively large jobs comparing to cluster size?

Of course I'm not about performance but just ability to pass this job for developers. Worst case I could 'reduce job with' grouping sub-jobs but I'd like not to do so because on production there is no reason to do so and I'd like test and production sequence to be the same.

When I have migrated to YARN scheduler was changed to FairScheduler and currently it is the only option as I run Cloudera and Cloudera strongly recommend not to use anything but fair scheduler. So switching to FIFO scheduler is not an option.

Any alternative in my case in addition to 'redesign job'?

Was it helpful?

Solution

Currently solved my troubles with disabling 'queue per user' logics (switch to single queue) and limiting amount of running applications using allocation file. In accordance to http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html this allows to configure almost anything per queue that you need.

Here are actual steps:

  • yarn.scheduler.fair.user-as-default-queue was set to false.
  • In Cloudera manager dynamic resource allocation for queue 'default' was changed so queue allows no more than 2 running applications. Good enough for 1-node design testing tool. In open source this will be correction to allocation file.

By now works as needed. Left everything else including default policy untouched.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top