Question

I'm running an Amazon Elastic MapReduce (EMR) job on 1 master node and 25 core nodes. Bootstrap actions are completed on the master node, but they hang on core nodes. ~5000 (of 5200) tasks constituting a map step are then reported to be "running," while the remaining tasks are "pending." Because the core nodes are hanging, however, nothing is actually being run; I can tell because no intermediate output is being written. After ~30 minutes, all previously "running" tasks are stamped "killed_unclean" and shifted to "pending." A few minutes later, bootstrap actions are completed on the core nodes, but none of the tasks then shift from "pending" to "running."

This problem does not arise when I run my job with 2 core nodes rather than 25; tasks are finished as expected. What could be wrong, and how can I fix it?

Was it helpful?

Solution

toth was right; I had set mapred.tasktracker.map.tasks.maximum to be too high, and the memory requirement was absurd. Amazon's default values are in general appropriate here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top