Setting mapred.child.java.opts in Hive script results in MR job getting 'killed' right away

https://stackoverflow.com/questions/22870565

27-06-2023
|

Pergunta

I have been having a few jobs failing due to OutOfMemory and GC overhead limit exceeded errors. To counter the former I tried setting SET mapred.child.java.opts="-Xmx3G"; at the start of the hive script**.

Basically any time I add this option to the script, the MR jobs that get scheduled(for the first of several queries in the script) are 'killed' right away.

Any thoughts on how to rectify this? Are there any other params that need to be tinkered with in conjunction with max heap space(eg. io.sort.mb)? Any help would be most appreciated.

FWIW, I am using hive-0.7.0 with hadoop-0.20.2. The default setting for max heap size in our cluster is 1200M.

TIA.

** - Some other alternatives that were tried(with comical effect but no discernible change in outcome):

SET mapred.child.java.opts="-Xmx3G";
SET mapred.child.java.opts="-server -Xmx3072M";
SET mapred.map.child.java.opts ="-server -Xmx3072M";

SET mapred.reduce.child.java.opts ="-server -Xmx3072M";
SET mapred.child.java.opts="-Xmx2G";

Update 1: It is possible that it's not necessarily anything to do with setting heap size. Tinkering with mapred.child.java.opts in any way is causing the same outcome. For example setting it thusly, SET mapred.child.java.opts="-XX:+UseConcMarkSweepGC"; is having the same result of MR jobs getting killed right away. Or even setting it explicitly in the script to what is the 'cluster default' causes this.

Update 2: Added a pastebin of a grep of JobTracker logs here.

Solução

Figured this would end up being something trivial/inane and it was in the end.

Setting mapred.child.java.opts thusly:

SET mapred.child.java.opts="-Xmx4G -XX:+UseConcMarkSweepGC";

is unacceptable. But this seem to go through fine:

SET mapred.child.java.opts=-Xmx4G -XX:+UseConcMarkSweepGC; (minus the double-quotes)

sigh. Having better debug options/error messages would have been nice.

Outras dicas

Two other guards can restrict task memory usage. Both are designed for admins to enforce QoS, so if you're not one of the admins on the cluster, you may be unable to change them.

The first is the ulimit, which can be set directly in the node OS, or by setting mapred.child.ulimit.

The second is a pair of cluster-wide mapred.cluster.max.*.memory.mb properties that enforce memory usage by comparing job settings mapred.job.map.memory.mb and mapred.job.reduce.memory.mb against those cluster-wide limits.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow