Question

I have a problem that I seriously spent months on now!

Essentially I am running code that requires to read from and save to HD5 files. I am using h5py for this.

It's very hard to debug because the problem (whatever it is) only occurs in like 5% of the cases (each run takes several hours) and when it gets there it crashes python completely so debugging with python itself is impossible. Using simple logs it's also impossible to pinpoint to the exact crashing situation - it appears to be very random, crashing at different points within the code, or with a lag.

I tried using OllyDbg to figure out whats happening and can safely conclude that it consistently crashes at the following location: http://i.imgur.com/c4X5W.pngimage

It seems to be shortly after calling the python native PyObject_ClearWeakRefs, with an access violation error message. The weird thing is that the file is successfully written to. What would cause the access violation error? Or is that python internal (e.g. the stack?) and not file (i.e. my code) related?

Has anyone an idea whats happening here? If not, is there a smarter way of finding out what exactly is happening? maybe some hidden python logs or something I don't know about? Thank you

Was it helpful?

Solution 2

Thanks for all the answers. I ran two versions this time, one with a new python install and my same program, another one on my original computer/install, but replacing all HDF5 read/write procedures with numpy read/write procedures.

The program continued to crash on my second computer at odd times, but on my primary computer I had zero crashes with the changed code. I think it is thus safe to conclude that the problems were HDF5 or more specifically h5py related. It appears that more people encountered issues with h5py in that respect. Given that any error in my application translates to potentially large financial losses I decided to dump HDF5 completely in favor of other stable solutions.

OTHER TIPS

PyObject_ClearWeakRefs is in the python interpreter itself. But if it only happens in a small number of runs, it could be hardware related. Things you could try:

  • Run your program on a different machine. if it doesn't crash there, it is probably a hardware issue.
  • Reinstall python, in case the installed version has somehow become corrupted.
  • Run a memory test program.

Use a try catch statement. This can be put into the program in order to stop the program from crashing when erroneous data is entered

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top