Hadoop job scheduling alongside jobs with slow mappers in 0.20.203

https://stackoverflow.com/questions/8156316

02-03-2021
|

Question

I am managing a Hadoop cluster that is shared between a number of users. We frequently run jobs with extremely slow mappers. For example, we might have a 32 GB file of sentences (one sentence per line) that we want to NLP parse (which takes say 100 ms per sentence). If the block size is 128 MB, this is 250 mappers. This fills our rather small cluster (9 nodes times 12 mappers per node is 108 mappers) but each mapper takes a very long time to complete (hours).

The problem is that if the cluster is empty and such a job is started, it uses all of the mappers on the cluster. Then, if anyone else wants to run a short job, it is blocked for hours. I know that newer versions of Hadoop support preemption in the Fair Scheduler (we are using the Capacity Scheduler), but newer versions also are not stable (I'm anxiously awaiting the next release).

There used to be the option of specifying the number of mappers but now JobConf is deprecated (strangely, it is not deprecated in 0.20.205). This would alleviate the problem because, with more mappers, each map task would work on a smaller data set and thus finish sooner.

Is there any way around this problem in 0.20.203? Do I need to subclass my InputFormat (in this case TextInputFormat)? If so, what exactly do I need to specify?

Solution

I believe that you should be able to increase the block size for these files : if you do that, then , naturally, your application will use far fewer mappers.

Remember also that there is the map.input.length parameter in the job configuration. This would increase the splits, so that you had, effectively, fewer mappers with larger inputs.

OTHER TIPS

More mappers will not solve your problem if there is a lack of actual physical resources (i.e. machines in the cluster). I would try to pack my data in fewer input files to avoid random hard drive seeks.

Edit: Ok if you want more mappers then try to partition your data into several small files or decrease the block size.

Not exactly sure if more mappers will solve your problem. JobConf#setNumMapTasks has no effect on the # on the map tasks spawned per job. Even the document says it's just a hint to the framework. The # of map tasks spawned equals to the # of input splits for the job. Here are the different options to decrease the size of the InputSplit and thereby to increase the # of InputSplits and increase the # of map tasks.

Decrease the size of the HDFS block by changing dfs.blocksize. But, this will increase the load on the NameNode as it has to keep more of the file vs block mapping and the the size of the DataBlock report also increases. Also, hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location will change the block size for the new files put in HDFS, old files will remain as-is. The old files have to be pulled out of HDFS and put back with the required block size.
Use the NLineInputFormat to control the # of input lines to each map. For this the Job has to be changed. mapred.line.input.format.linespermap which is defaulted to 1 has to be defined.
From 0.21 release mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize have been defined, but it's with the new MR API. The InputSplit calculation is done on the client, so it can't be enforced to the Job clients.

The logic for calculating the InputSplit size is below.

protected long computeSplitSize(long blockSize, long minSize, long maxSize) {  
    return Math.max(minSize, Math.min(maxSize, blockSize));  
}

You do not need to upgrade Hadoop to change the schedulers. I have had success in changing the default scheduler to the fair scheduler. Just follow the instructions at http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow