Question

I am seing a lot of too many open files exceptions in the execution of my program. Typically those occur in the following form:

org.jboss.netty.channel.ChannelException: Failed to create a selector.

...
Caused by: java.io.IOException: Too many open files

However, those are not the only exceptions. I have observed similar ones (caused by "too many open files") but those are much less frequent.

Strangely enough i have set the limit of open files of the screen session (from where i launch my programs) as 1M:

root@s11:~/fabiim-cbench# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
**open files                      (-n) 1000000**
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Moreover, as observed by the output of lsof -p I see no more that 1111 open files (sockets, pipes, files) before the exceptions are thrown.

Question: What is wrong and/or how can i dig deeper into this problem.

Extra: I am currently integrating Floodlight with bft-smart. In a nutshell the floodlight process is the one crashing with too much open files exceptions when executing a stress test launched by a benchmark program. This benchmark program will maintain 64 tcp connections to the floodlight process which in turn should maintain at least 64 * 3 tcp connections to the bft-smart replicas. Both programs use netty to manage these connections.

Était-ce utile?

La solution

First thing to check—can you run ulimit from inside your Java process to make sure that the file limit is the same inside? Code like this should work:

InputStream is = Runtime.getRuntime().exec(new String[] {"bash", "-c", "ulimit -a"}).getInputStream();
int c;
while ((c = is.read()) != -1) {
    System.out.write(c);
}

If the limit still shows 1 million, well, you’re up for some hard debugging.

Here are a couple of things that I would look into if I had to debug this—

  1. Are you running out of tcp port numbers? What does netstat -an show when you hit this error?

  2. Use strace to find out exactly what system call with what parameters is causing this error to be thrown. EMFILE is a return value of 24.

  3. The “Too many open files” EMFILE error can actually be thrown by a number of different system calls for a number of different reasons:

    $ cd /usr/share/man/man2
    $ zgrep -A 2 EMFILE *
    accept.2.gz:.B EMFILE
    accept.2.gz:The per-process limit of open file descriptors has been reached.
    accept.2.gz:.TP
    accept.2.gz:--
    accept.2.gz:.\" EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE,
    accept.2.gz:.\" ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK.
    accept.2.gz:.\" In addition, SUSv2 documents EFAULT and ENOSR.
    dup.2.gz:.B EMFILE
    dup.2.gz:The process already has the maximum number of file
    dup.2.gz:descriptors open and tried to open a new one.
    epoll_create.2.gz:.B EMFILE
    epoll_create.2.gz:The per-user limit on the number of epoll instances imposed by
    epoll_create.2.gz:.I /proc/sys/fs/epoll/max_user_instances
    eventfd.2.gz:.B EMFILE
    eventfd.2.gz:The per-process limit on open file descriptors has been reached.
    eventfd.2.gz:.TP
    execve.2.gz:.B EMFILE
    execve.2.gz:The process has the maximum number of files open.
    execve.2.gz:.TP
    execve.2.gz:--
    execve.2.gz:.\" document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL,
    execve.2.gz:.\" EISDIR or ELIBBAD error conditions.
    execve.2.gz:.SH NOTES
    fcntl.2.gz:.B EMFILE
    fcntl.2.gz:For
    fcntl.2.gz:.BR F_DUPFD ,
    getrlimit.2.gz:.BR EMFILE .
    getrlimit.2.gz:(Historically, this limit was named
    getrlimit.2.gz:.B RLIMIT_OFILE
    inotify_init.2.gz:.B EMFILE
    inotify_init.2.gz:The user limit on the total number of inotify instances has been reached.
    inotify_init.2.gz:.TP
    mmap.2.gz:.\" SUSv2 documents additional error codes EMFILE and EOVERFLOW.
    mmap.2.gz:.SH AVAILABILITY
    mmap.2.gz:On POSIX systems on which
    mount.2.gz:.B EMFILE
    mount.2.gz:(In case no block device is required:)
    mount.2.gz:Table of dummy devices is full.
    open.2.gz:.B EMFILE
    open.2.gz:The process already has the maximum number of files open.
    open.2.gz:.TP
    pipe.2.gz:.B EMFILE
    pipe.2.gz:Too many file descriptors are in use by the process.
    pipe.2.gz:.TP
    shmop.2.gz:.\" SVr4 documents an additional error condition EMFILE.
    shmop.2.gz:
    shmop.2.gz:In SVID 3 (or perhaps earlier)
    signalfd.2.gz:.B EMFILE
    signalfd.2.gz:The per-process limit of open file descriptors has been reached.
    signalfd.2.gz:.TP
    socket.2.gz:.B EMFILE
    socket.2.gz:Process file table overflow.
    socket.2.gz:.TP
    socketpair.2.gz:.B EMFILE
    socketpair.2.gz:Too many descriptors are in use by this process.
    socketpair.2.gz:.TP
    spu_create.2.gz:.B EMFILE
    spu_create.2.gz:The process has reached its maximum open files limit.
    spu_create.2.gz:.TP
    timerfd_create.2.gz:.B EMFILE
    timerfd_create.2.gz:The per-process limit of open file descriptors has been reached.
    timerfd_create.2.gz:.TP
    truncate.2.gz:.\" error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK.  SVr4 documents for
    truncate.2.gz:.\" .BR ftruncate ()
    truncate.2.gz:.\" an additional EAGAIN error condition.
    

    If you check out all these manpages by hand, you may find something interesting. For example, I think it’s interesting that epoll_create, the underlying system call that is used by NIO channels, will return EMFILE “Too many open files” if

    The per-user limit on the number of epoll instances imposed by /proc/sys/fs/epoll/max_user_instances was encountered. See epoll(7) for further details.

    Now that filename doesn’t actually exist on my system, but there are some limits defined in files in /proc/sys/fs/epoll and /proc/sys/fs/inotify that you might be hitting, especially if you’re running multiple instances of the same test on the same machine. Figuring out if that’s the case is a chore in itself—you could start by checking syslog for any messages…

Good luck!

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top