Question

I have a question regarding elastic mapreduce on amazon web services. Has any of you been able to set the following environment parameter:

mapreduce.map/reduce.java.opts

The problem is when I check the heap size in the virtual machine of both mappers and reducers the maximum heap size is not affected by setting these. I check the heapsize as by adding the following lines to my map/reduce code:

runtime = Runtime.getRuntime();
System.out.println(runtime.maxMemory()); 

I am setting them using the command line interface with the following parameters:

-bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "-m,mapreduce.map.java.opts=-Xmx1000m,-m,mapreduce.reduce.java.opts=-Xmx3000m"

I checked the hadoop version on Amazon EMR is the following: 1.0.3 (I checked the reference book by Tom White and it says these should be supported starting hadoop 0.21.)

It is possible though to set the VM settings of a child process (=same for both mapper and reducer -> mapred.child.java.opts) but this is very inconvenient for my algorithm in which the reducer has to store a large hashmap while the mapper doesn't story anything.

Maybe related to this question: is it possible to get a warning if you set unsupported environment variables. When I set the above variable they can be accessed but they are not used/supported apparently. (configuration.get(...) returns the values I set)

Was it helpful?

Solution

If you look in the hadoop-1.0.3/docs folder, you will find a file named mapred_tutorial.html. In the "Task Execution & Environment" section, the document tells you to use the following:

mapred.{map|reduce}.child.java.opts

They have changed the configuration name, so the mapreduce.map/reduce.java.opts from hadoop-0.21.0 does not work anymore on the newer hadoop-1.0.3.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top