Domanda

Is it possible to set number of map task running per node.
I'm using Hadoop Streaming for crawling data, and I need only one map task per node to avoid blocks.

Thanks,

È stato utile?

Soluzione

Irrespective of Streaming or not, the maximum # of mappers per node can be set using the mapreduce.tasktracker.map.tasks.maximum parameters. The parameter has to be set in the mapred-site.xml file on the node, this property has no effect when set on the client.

Altri suggerimenti

Have you tried playing with the following settings in your job.xml?

mapred.max.maps.per.node=1
mapred.max.reduces.per.node=1

These are default to -1, unlimited (except of course, by available slots).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top