Domanda

I'm trying to use hadoop on Amazon Elastic MapReduce where I have thousands of map tasks to perform. I'm OK if a small percentage of the tasks fail, however, Amazon shuts down the job and I lose all of the results when the first mapper fails. Is there a setting I can use to increase the number of failed jobs that are allowed? Thanks.

È stato utile?

Soluzione

Here's the answer for hadoop:

Is there any property to define failed mapper threshold

To use the setting described above in EMR, look at:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html#PredefinedbootstrapActions_ConfigureHadoop

Specifically, you create an xml file (config.xml in the example) with the setting that you want to change and apply bootstrap action:

./elastic-mapreduce --create \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \ --args "-M,s3://myawsbucket/config.xml"

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top