Question

I am using the yelp MRjob for writing some of the mapreduce programs. I am running it on EMR. My program has reducer code which takes a long time to execute. I am noticing that because of the default timeout period in EMR I am getting this error

Task attempt_201301171501_0001_r_000000_0 failed to report status for 600 seconds.Killing!

I want a way to increase the timeout of the EMR. I read the mrjobs official documentation about the same but I was not able to understand the procedure. Can someone suggest a way to solve this issue.

Was it helpful?

Solution

I've dealt with a similar issue with EMR in the past, the property you are looking for mapred.task.timeout which corresponds to the number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string.

With MRJob, you could add the following option:

--jobconf mapred.task.timeout=1800000

EDIT: It appears that some EMR AMIs appear do not support setting parameters like timeout with jobconf at run time. Instead, you must use Bootstrap-time configuration like this:

--bootstrap-action="s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m mapred.task.timeout=1800000"

I would still try the first one to start with and see if you can get it to work, otherwise try the bootstrap action.

To run any of these parameters, just create your job extending from MRJob, this class has a jobconf method that will read your --jobconf parameters, so you should specify these as regular options on command line:

python job.py --num-ec2-instances 42 --python-archive t.tar.gz -r emr --jobconf mapred.task.timeout=1800000 /path/to/input.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top