Frage

Update Again

I have tried to create some simple way to reproduce this, but have not been successful.

So far, I have tried various simple array allocations and manipulations, but they all throw an MemoryError rather than just SIGKILL crashing.

For example:

x =np.asarray(range(999999999))

or:

x = np.empty([100,100,100,100,7])

just throw MemoryErrors as they should.

I hope to have a simple way to recreate this at some point.

End Update

I have a python script running numpy/scipy and some custom C extensions.

On my Ubuntu 14.04 under Virtual Box, it runs to completion just fine.

On an Amazon EC2 T2 micro instance, it terminates (after running a while) with the output:

Killed

Running under the python debugger, the signal is not caught and the debugger exits as well.

Running under strace, I get:

munmap(0x7fa5b7fa6000, 67112960)        = 0
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5b7fa6000    
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5affa4000    
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5abfa3000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a7f22000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a3ea1000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa59fe20000    
gettimeofday({1406518336, 306209}, NULL) = 0    
gettimeofday({1406518336, 580022}, NULL) = 0    
+++ killed by SIGKILL +++

running under gdb while trying to catch "SIGKILL", I get:

[Thread 0x7fffe7148700 (LWP 28022) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.

running python's trace module (python -m trace --trace ), I get:

defmatrix.py(292):         if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(293):         ndim = self.ndim
defmatrix.py(294):         if (ndim == 2):
defmatrix.py(295):             return
defmatrix.py(336):         return out
 --- modulename: linalg, funcname: norm
linalg.py(2052):     x = asarray(x)
 --- modulename: numeric, funcname: asarray
numeric.py(460):     return array(a, dtype, copy=False, order=order)

I can't think of anything else at the moment to figure out what is going on.

I suspect maybe it might be running out of memory (it is an AWS Micro instance), but I can't figure out how to confirm or deny that.

Is there another tool I could use that might help pinpoint exactly where the program is stopping? (or I am running one of the above tools the wrong way for this problem?)

Update

The Amazon EC2 T2 micro instance has no swap space defined by default, so I added a 4GB swap file and was able to run the program to completion.

However, I am still very interested in a way to have run the program such that it terminated with some message a little closer to "Not Enough Memory" rather than "Killed"

If anyone has any suggestions, they would be appreciated.

War es hilfreich?

Lösung

It sounds like you've run into the dreaded Linux OOM Killer. When the system completely runs of out of memory and the kernel absolutely needs to allocate memory, it kills a process rather than crashing the entire system.

Look in the syslog for confirmation of this. A line similar to:

kernel: [884145.344240] mysqld invoked oom-killer:

followed sometime later with:

kernel: [884145.344399] Out of memory: Kill process 3318

Should be present (in this example, it mentions mysql specifically)

You can add these lines to your /etc/sysctl.conf file to effectively disable the OOM killer:

vm.overcommit_memory = 2
vm.overcommit_ratio = 100

And then reboot. Now, the original, memory hungry, process should fail to allocate memory and, hopefully, throw the proper exception.

Setting overcommit_memory means that Linux won't over commit memory, meaning memory allocations will fail if there isn't enough memory for them. See this answer for details on what effect the overcommit_ratio has: https://serverfault.com/a/510857

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top