Question

I am seing a lot of too many open files exceptions in the execution of my program. Typically those occur in the following form:

org.jboss.netty.channel.ChannelException: Failed to create a selector.

...
Caused by: java.io.IOException: Too many open files

However, those are not the only exceptions. I have observed similar ones (caused by "too many open files") but those are much less frequent.

Strangely enough i have set the limit of open files of the screen session (from where i launch my programs) as 1M:

root@s11:~/fabiim-cbench# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
**open files                      (-n) 1000000**
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Moreover, as observed by the output of lsof -p I see no more that 1111 open files (sockets, pipes, files) before the exceptions are thrown.

Question: What is wrong and/or how can i dig deeper into this problem.

Extra: I am currently integrating Floodlight with bft-smart. In a nutshell the floodlight process is the one crashing with too much open files exceptions when executing a stress test launched by a benchmark program. This benchmark program will maintain 64 tcp connections to the floodlight process which in turn should maintain at least 64 * 3 tcp connections to the bft-smart replicas. Both programs use netty to manage these connections.

Was it helpful?

Solution

First thing to check—can you run ulimit from inside your Java process to make sure that the file limit is the same inside? Code like this should work:

InputStream is = Runtime.getRuntime().exec(new String[] {"bash", "-c", "ulimit -a"}).getInputStream();
int c;
while ((c = is.read()) != -1) {
    System.out.write(c);
}

If the limit still shows 1 million, well, you’re up for some hard debugging.

Here are a couple of things that I would look into if I had to debug this—

  1. Are you running out of tcp port numbers? What does netstat -an show when you hit this error?

  2. Use strace to find out exactly what system call with what parameters is causing this error to be thrown. EMFILE is a return value of 24.

  3. The “Too many open files” EMFILE error can actually be thrown by a number of different system calls for a number of different reasons:

    $ cd /usr/share/man/man2
    $ zgrep -A 2 EMFILE *
    accept.2.gz:.B EMFILE
    accept.2.gz:The per-process limit of open file descriptors has been reached.
    accept.2.gz:.TP
    accept.2.gz:--
    accept.2.gz:.\" EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE,
    accept.2.gz:.\" ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK.
    accept.2.gz:.\" In addition, SUSv2 documents EFAULT and ENOSR.
    dup.2.gz:.B EMFILE
    dup.2.gz:The process already has the maximum number of file
    dup.2.gz:descriptors open and tried to open a new one.
    epoll_create.2.gz:.B EMFILE
    epoll_create.2.gz:The per-user limit on the number of epoll instances imposed by
    epoll_create.2.gz:.I /proc/sys/fs/epoll/max_user_instances
    eventfd.2.gz:.B EMFILE
    eventfd.2.gz:The per-process limit on open file descriptors has been reached.
    eventfd.2.gz:.TP
    execve.2.gz:.B EMFILE
    execve.2.gz:The process has the maximum number of files open.
    execve.2.gz:.TP
    execve.2.gz:--
    execve.2.gz:.\" document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL,
    execve.2.gz:.\" EISDIR or ELIBBAD error conditions.
    execve.2.gz:.SH NOTES
    fcntl.2.gz:.B EMFILE
    fcntl.2.gz:For
    fcntl.2.gz:.BR F_DUPFD ,
    getrlimit.2.gz:.BR EMFILE .
    getrlimit.2.gz:(Historically, this limit was named
    getrlimit.2.gz:.B RLIMIT_OFILE
    inotify_init.2.gz:.B EMFILE
    inotify_init.2.gz:The user limit on the total number of inotify instances has been reached.
    inotify_init.2.gz:.TP
    mmap.2.gz:.\" SUSv2 documents additional error codes EMFILE and EOVERFLOW.
    mmap.2.gz:.SH AVAILABILITY
    mmap.2.gz:On POSIX systems on which
    mount.2.gz:.B EMFILE
    mount.2.gz:(In case no block device is required:)
    mount.2.gz:Table of dummy devices is full.
    open.2.gz:.B EMFILE
    open.2.gz:The process already has the maximum number of files open.
    open.2.gz:.TP
    pipe.2.gz:.B EMFILE
    pipe.2.gz:Too many file descriptors are in use by the process.
    pipe.2.gz:.TP
    shmop.2.gz:.\" SVr4 documents an additional error condition EMFILE.
    shmop.2.gz:
    shmop.2.gz:In SVID 3 (or perhaps earlier)
    signalfd.2.gz:.B EMFILE
    signalfd.2.gz:The per-process limit of open file descriptors has been reached.
    signalfd.2.gz:.TP
    socket.2.gz:.B EMFILE
    socket.2.gz:Process file table overflow.
    socket.2.gz:.TP
    socketpair.2.gz:.B EMFILE
    socketpair.2.gz:Too many descriptors are in use by this process.
    socketpair.2.gz:.TP
    spu_create.2.gz:.B EMFILE
    spu_create.2.gz:The process has reached its maximum open files limit.
    spu_create.2.gz:.TP
    timerfd_create.2.gz:.B EMFILE
    timerfd_create.2.gz:The per-process limit of open file descriptors has been reached.
    timerfd_create.2.gz:.TP
    truncate.2.gz:.\" error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK.  SVr4 documents for
    truncate.2.gz:.\" .BR ftruncate ()
    truncate.2.gz:.\" an additional EAGAIN error condition.
    

    If you check out all these manpages by hand, you may find something interesting. For example, I think it’s interesting that epoll_create, the underlying system call that is used by NIO channels, will return EMFILE “Too many open files” if

    The per-user limit on the number of epoll instances imposed by /proc/sys/fs/epoll/max_user_instances was encountered. See epoll(7) for further details.

    Now that filename doesn’t actually exist on my system, but there are some limits defined in files in /proc/sys/fs/epoll and /proc/sys/fs/inotify that you might be hitting, especially if you’re running multiple instances of the same test on the same machine. Figuring out if that’s the case is a chore in itself—you could start by checking syslog for any messages…

Good luck!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top