python mmap.error: Too many open files. What's wrong?

https://stackoverflow.com/questions/5830397

27-10-2019
|

Question

I'm reading a bunch of netcdf files using the pupynere interface (linux). The following code results in an mmap error:

import numpy as np
import os, glob
from pupynere import NetCDFFile as nc
alts = []
vals = []
path='coll_mip'
filter='*.nc'
for infile in glob.glob(os.path.join(path, filter)):
        curData = nc(infile,'r')
        vals.append(curData.variables['O3.MIXING.RATIO'][:])
        alts.append(curData.variables['ALTITUDE'][:])
        curData.close()

Error:

$ python2.7 /mnt/grid/src/profile/contra.py
Traceback (most recent call last):
  File "/mnt/grid/src/profile/contra.py", line 15, in <module>
  File "/usr/lib/python2.7/site-packages/pupynere-1.0.13-py2.7.egg/pupynere.py", line 159, in __init__
  File "/usr/lib/python2.7/site-packages/pupynere-1.0.13-py2.7.egg/pupynere.py", line 386, in _read
  File "/usr/lib/python2.7/site-packages/pupynere-1.0.13-py2.7.egg/pupynere.py", line 446, in _read_var_array
mmap.error: [Errno 24] Too many open files

Interestingly, if I comment one of the append commands (either will do!) it works! What am I doing wrong? I'm closing the file, right? This is somehow related to the python list. I used a different, inefficient approach before (always copying each element) that worked.

PS: ulimit -n yields 1024, program fails at file number 498.

maybe related to, but solution doesn't work for me: NumPy and memmap: [Errno 24] Too many open files

Solution

My guess is that the mmap.mmap call in pupynere is holding the file descriptor open (or creating a new one). What if you do this:

vals.append(curData.variables['O3.MIXING.RATIO'][:].copy())
alts.append(curData.variables['ALTITUDE'][:].copy())

OTHER TIPS

@corlettk: yeah since it is linux, do strace -e trace=file will do

strace -e trace=file,desc,munmap python2.7 /mnt/grid/src/profile/contra.py

This will show exactly which file is opened when - and even the file decriptors.

You can also use

ulimit -a

To see what limitations are currently in effect

Edit

gdb --args python2.7 /mnt/grid/src/profile/contra.py
(gdb) break dup
(gdb) run

If that results in too many breakpoints prior to the ones related to the mapped files, you might want to run it without breakpoints for a while, break it manually (Ctrl+C) and set the breakpoint during 'normal' operation; that is, if you have enough time for that :)

Once it breaks, inspect the call stack with

(gdb) bt

Hmmm... Maybe, just maybe, with curData might fix it? Just a WILD guess.

EDIT: Does curData have a Flush method, perchance? Have you tried calling that before Close?

EDIT 2: Python 2.5's with statement (lifted straight from Understanding Python's "with" statement)

with open("x.txt") as f:
    data = f.read()
    do something with data

... basically it ALLWAYS closes the resource (much like C#'s using construct).

How expensive is the nc() call? If it is 'cheap enough' to run twice on every file, does this work?

for infile in glob.glob(os.path.join(path, filter)):
        curData = nc(infile,'r')
        vals.append(curData.variables['O3.MIXING.RATIO'][:])
        curData.close()
        curData = nc(infile,'r')
        alts.append(curData.variables['ALTITUDE'][:])
        curData.close()

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow