Question

Say I have 200 input files and 20 nodes, and each node has 10 mapper slots. Will Hadoop always allocate the work evenly, such that each node will get 10 input files and simultaneously start 10 mappers? Is there a way to force this behavior?

Was it helpful?

Solution

how many mappers are used is determined by input -- specifically input splits. So in your case, 200 files could be fed to 200 mappers. But the real answer is a little more complicated. It depends on

  • file size : if a file is bigger than a block size, then block sized chunk is sent to a mapper

  • are the files splittable. for example gzip compressed files can not be split. And the one entire file goes to one mapper (even if the file is bigger than a block size)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top