Вопрос

Recently I have been bitten by the FD_SET buffer overflow twice. The first time is we have too much socket (1024+) to added into the FD_SET. This is an test case, we have disabled it, and add assert to detect this case.

Today we hit another related issue when we run an test case for 1000+ times. Each time, the test case will somehow trigger to allocate an socket, and later release it before the test case finished. This test case will hit FD_SET buffer overflow when we run 1000+ time.

We have find the root cause:

  1. For each pass, the allocate socket id will increase(+1), it will not reuse the socket id in a long time. The Operating system is MAC, and I think it is an reasonable design to avoid using already released socket without error happen.
  2. FD_SET only set the fd_set bit array using socket id as index, if the socket id is large, it will overflow. I think fd_set is an bad design.

We think the 1000+ is an reasonable number. And we don't think define MACRO to set 'fd_set' huge is not reasonable and wasting memory and CPU when wait.

We doesn't know how to resolve it, so any suggestion?

-------------Edit1----------------

It turn out there is socket leak in other place, which violate destructor should release all resource. And this make the socket id increase. So item #1 is not true. Operating system will reuse the socket id. But anyway, the discuss is helpful, and the FD_SET is bad design, and we should using poll().

Это было полезно?

Решение

This answer summarizes the solution found by the OP, and comments by rob mayoff and Joseph Quinsey.

If a program is not reusing a file descriptor (what you called a 'socket id'), it is not closing the file descriptor. Try running lsof on your test program when it's been running for a while. You will probably find many open sockets in the output. (But the OP says lsof -g PID doesn't seem to work on debugged process).

Alternatively, try netstat -a -p --inet | grep process-name-or-pid.

On some systems, sometimes a simple close(fd) for a socket is not sufficient. If your socket file descriptors are constantly increasing, then the answer close() is not closing socket properly might help.

To avoid the problem with FD_SETSIZE, several writers, for example Increasing limit of FD_SETSIZE and select, suggest using poll rather than select.

Finally, the OP solved the issue:

It turned out there was socket leak in another place, which violate destructor should release all resource. And this made the socket id increase. Fixed, the operating system will reuse the socket id.

But anyway, the discussion is helpful, and the FD_SET is bad design, and we should using poll().

Note that Unix-like systems always (or usually) use the smallest available file descriptor. For example, the man page for open(2) states;

The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top