문제

I'm trying to run NAS-UPC benchmarks on a 32 node cluster.

It works fine in cases where the problem size is small . When I graduate to a bigger problem size (CLASS D), I get this error (for MG benchmark)

*** Caught a fatal signal: SIGBUS(7) on node 2/32
 p4_error: latest msg from perror: Bad file descriptor
*** Caught a signal: SIGPIPE(13) on node 0/32
    p4_error: latest msg from perror: Bad file descriptor
   p4_error: latest msg from perror: Bad file descriptor

*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 27/32
*** Caught a signal: SIGPIPE(13) on node 20/32
*** Caught a signal: SIGPIPE(13) on node 21/32
    p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 16/32
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit

Can anybody explain why this is happening , And if anyone has seen this error before and fixed it ?

EDIT : Figured out it is a memory related problem . But I'm unable to allott right amount of memory for application at compile time

도움이 되었습니까?

해결책 2

I figured it is a problem with benchmark needing more memory than i had allotted it during compile time.

다른 팁

Check a dmesg output - it can be an out-of-memory issue. Or, again, it can be a some from ulimit -a hitted, e.g. a stacksize (default stack size is too small for some NAS tasks).

If you have a lines like "Out of Memory: Killed process ###" in dmesg output on any of your machines - it means that your program required (and tried to use) a lot of memory, bigger than your OS can give to the application. There are several limits of memory:

  1. ulimit -v - user limit for virtual memory size. Check all ulimit -a limits also, but seems that your case is not this
  2. You can use not more memory than you have total RAM and all swap sizes (check with free command). But if your application uses more memory than RAM size, and begin to do swapping - the performance will be bad (in most cases).
  3. There are architectural limits of maximum memory, allowable to single process to have. For 32-bit nodes this limit can be from 1(very rare case) to 2, 3, 4 GB. Even if your 32-bit system have >4 GB of memory, e.g. with using of PAE - no single process can take > 4 Gb. A big part of 4Gb virtual space also taken by OS (from hundreds of MB up to GBs).
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top