Question

I have a couple of processes running on RHEL 6.3, but for some reason they are exceeding the thread stack sizes.

For example, the Java process is given the stack size of -Xss256k at runtime on startup, and the C++ process is given a thread stack size of 1MB using pthread_attr_setstacksize() in the actual code.

For some reason however, these processes are not sticking to these limits, and I'm not sure why.

For example, when I run

pmap -x <pid> 

for the C++ and Java process, I can see hundreds of 'anon' threads for each (which I have confirmed are the internal worker threads created by each of these processes), but these have an allocated value of 64MB each, not the limits set above:

00007fa4fc000000 168 40 40 rw--- [ anon ] 
00007fa4fc02a000 65368 0 0 ----- [ anon ] 
00007fa500000000 168 40 40 rw--- [ anon ] 
00007fa50002a000 65368 0 0 ----- [ anon ] 
00007fa504000000 168 40 40 rw--- [ anon ] 
00007fa50402a000 65368 0 0 ----- [ anon ] 
00007fa508000000 168 40 40 rw--- [ anon ] 
00007fa50802a000 65368 0 0 ----- [ anon ] 
00007fa50c000000 168 40 40 rw--- [ anon ] 
00007fa50c02a000 65368 0 0 ----- [ anon ] 
00007fa510000000 168 40 40 rw--- [ anon ] 
00007fa51002a000 65368 0 0 ----- [ anon ] 
00007fa514000000 168 40 40 rw--- [ anon ] 
00007fa51402a000 65368 0 0 ----- [ anon ] 
00007fa518000000 168 40 40 rw--- [ anon ] 
...

But when I run the following on the above process with all the 64MB 'anon' threads

cat /proc/<pid>/limits | grep stack 

Max stack size 1048576 1048576 bytes 

it shows a max thread stack size of 1MB, so am a bit confused as to what is going on here. Also, the script that calls these programs sets 'ulimit -s 1024' as well.

It should be noted that this only seems to occur when using a very high end machines (e.g. 48GB RAM, 24 CPU cores). The issue does not appear on less powerful machines (e.g. 4GB RAM, 2 CPU cores).

Any help understanding what is happening here would be much appreciated.

Was it helpful?

Solution

Turns out that RHEL6 2.11 have changed the thread model such that each thread where possible gets allocated its own thread pool, so on a larger system you may see it grabbing up to the 64MB. On 64 bit the max number of thread pools allowed is greater.

The fix for this was to add

export LD_PRELOAD=/path/to/libtcmalloc.so 

in the script that starts the processes (rather than using glibc2.11)

Some more inforation on this is available from:

Linux glibc >= 2.10 (RHEL 6) malloc may show excessive virtual memory usage https://www.ibm.com/developerworks/mydeveloperworks/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

glibc bug malloc uses excessive memory for multi-threaded applications http://sourceware.org/bugzilla/show_bug.cgi?id=11261

Apache hadoop have fixed the problem by setting MALLOC_ARENA_MAX https://issues.apache.org/jira/browse/HADOOP-7154

OTHER TIPS

The stack size as reported with /proc/1234/limits is set with setrlimit(2) (perhaps by PAM subsystem at login time).

I have no real idea as to why the actual stack segments seems to be 64Mb each. Perhaps your big server uses huge pages (but your desktop don't).

You might call setrlimit (perhaps with the ulimit bash builtin, or the limit zsh builtin) in e.g. the script calling your program.

You can use the ulimit -s <size_in_KB> to set the maximum stack size for processes. You can see the current limit using ulimit -s too.

@rory With reagard to your answer, the 64mb block address should be heap address, but now the address is like 00007fa50c02a000 that is stack address, right?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top