Question

The situation:

  • Multiple processes share the same file descriptor table.
  • Each process listens to its own epoll instance.
  • All sockets and calls involving them are non-blocking.
  • Among those processes, only process A adds the listening socket to its epoll instance.
  • Process A knows the fds of the epolls of all other processes.
  • When a new socket (i.e. a new connection) arrives, process A adds it to an epoll instance of one of the other processes...

...like so:

int new_sfd;
while ((new_sfd = accept4(listening_fd, NULL, NULL, SOCK_NONBLOCK)) != -1) {
    if (epoll_ctl(other_epoll_fds[new_sfd % PROCESS_C], EPOLL_CTL_ADD, new_sfd,
                  &(struct epoll_event){
                      .data = {.fd = new_sfd},
                      .events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET
                  }) == -1) {
        perror("Failed to add a new socket to an epoll instance");
        close(new_sfd);
    }
}
if (errno != EAGAIN) {
    perror("Failed to accept one or more incoming connections");
}

This seems to work (i.e. no errors occur at this stage). When a connection comes in at process A, it gets added to the epoll of a process B, after which process B gets an event with the EPOLLIN flag set, as expected. Process B then obtains the fd of the new socket by reading the data.fd member of the received epoll_event structure, and attempts a recv() on said fd.

Here things go unexpectedly wrong. recv() returns -1 with the following error: Socket operation on non-socket.

What gives? By inserting a lot of debugging printf() statements everywhere, I've thoroughly verified that the fd value returned by accept4() in process A is in fact the same as the fd value that I pass as the first argument to recv() in process B (and again, all processes share the same file descriptor table), so I can't make any sense of this. Help?! D:

Was it helpful?

Solution

A lot of Googling later, and I've landed on the following page:

https://patchwork.kernel.org/patch/2356101/

Apparently execve() undos the sharing of the file descriptor table that I had set up with clone() and the CLONE_FILES flag (so any socket fd appearing after execve() is called causes a copy-on-write of the table for the listening process A, rendering the change invisible to the other processes).

I was just unlucky enough that this behavior of execve() is not yet documented in the current version of the manuals. Hence the above patch by Kevin Easton (thanks, Kevin). Also, thanks, Devolus for leading me in the right direction.

OTHER TIPS

If A and B are really two separate processes, then they have a different set of filedescriptors. So Descriptor 6 in A is not neccessarily the same as 6 in B. If the return value indicates that the FD is not a socket, then it probably isn't, but this is not visible from your code. Verifiny with printf that you pass the same FD only made sure that these FD have the same values, but that doesn't mean that they are the same, as they are most likely not.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top