hadoop cluster: map task run only on one machine and not all

https://stackoverflow.com/questions/7895681

11-02-2021
|

Domanda

I have a hadoop cluster of three machines where one machine acts as both master and slave.

When I run wordcount-example, it run map tasks on two machines - worker1 and worker2. But when I run my own code, it runs only on one machine - worker1, how can I make map tasks run on all machines?

Input Split Locations

/default-rack/master
/default-rack/worker1
/default-rack/worker2

FIXED!!!

I added the following in my configuration of mapred-site.xml and it fixed it

<property>
  <name>mapred.map.tasks</name>
  <value>100</value>
</property>

Soluzione

How big is your input? Hadoop splits up the jobs into input splits, and if your file is too small, it will only have one split.

Try a larger file-- say around 1GB in size and see how mappers you get then.

You can also check to make sure that every TaskTracker is reporting properly to the JobTracker. If there is a TaskTracker that is not properly connected, it will not get tasks:

   $ hadoop job -list-active-trackers

This command should output all 3 of your hosts.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow