A confusion about datanode memory usge when several map task running in parallell on same datanote

StackOverflow https://stackoverflow.com/questions/23171131

سؤال

As we all known, map task have "io.sort.mb" to limit the memory that sort phase use.

1)my confusion is: let me suppose if the datanode have 3G spare memory ,I set "io.sort.mb" 1024m when several map task parallelly running on the same datanode and they need memory which is over 3G spare memory on the node, what will happen.

the similar question:

2)"mapred.child.java.opts" -Xmx1024m give maximum Memory that child jvm can use,if when several map task parallelly running on the datanode mentioned in 1), is it possible for these parallel map to occupy whole 3G spare memory, then What will happen.

the similar question on hbase:

3)given that:on hbase set BlockCache 0.3*heap,set Memstore 0.4*heap, In extreme cases there are 2 region existing in a regionserver. both the 2 region is using 0.6*heap = 1.2 heap. will the Scenario happen?

If you know any about above question(if the case could happen),Welcome and appreciate your sharing.

هل كانت مفيدة؟

المحلول

The answer to at least the first two questions comes from paying attention to the configuration file in which these two properties are defined.

They are defined in the mapred-default.xml and could be optionally overridden in the mapred-site.xml.

These configuration files govern a entire data node, so the configurations are not on a single job or task basis.

So for example io.sort.mb as 1024 implies that 1024 mb of the data node will be allocated for the sorting phase of all the map tasks running that node. It is shared between all map tasks.

I am not entirely sure of the 3rd question but my guess is that it should work along similar lines.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top