Question

I was working with a TCP server and something ultra weird came up.

When I connect with one client, everythings goes fine.
When two or more client connects, traffic is still fine.
But when any of the clients DISCONNECTS, the server jams right after the reaper call. It just sits there waiting.

I discovered this when I disconnected 1 client out of 2 & tried to reconnect. The reconnecting client does not display any error messages at all, it runs smoothly. Packages CAN be sent to the server, but it piles up inside the server.

The server on the other hand hangs there, waiting for one specific client to disconnect. If that client disconnects, the server will resume function, executing all requests that were piled up inside it.

Below is the barebone code I used for the server structure.
This code also demonstrates the problem stated above.
Can anyone please, please, please point out where the error was made?

void    reaper(int sig)
{
int status;

while (waitpid(-1, &status, WNOHANG) >= 0)
    /* empty */;
}


int     errexit(const char *format, ...)
{
    va_list args;

    va_start(args, format);
    vfprintf(stderr, format, args);
    va_end(args);
    exit(1);
}




int     errno;

unsigned short  portbase = 0;   /* port base, for non-root servers      */

int     passivesock(const char *service, const char *transport, int qlen)

{
    struct servent  *pse;   /* pointer to service information entry */
    struct protoent *ppe;   /* pointer to protocol information entry*/
    struct sockaddr_in sin; /* an Internet endpoint address         */
    int     s, type;        /* socket descriptor and socket type    */

    memset(&sin, 0, sizeof(sin));
    sin.sin_family = AF_INET;
    sin.sin_addr.s_addr = INADDR_ANY;

/* Map service name to port number */
    if ( pse = getservbyname(service, transport) )
            sin.sin_port = htons(ntohs((unsigned short)pse->s_port)
                    + portbase);
    else if ((sin.sin_port=htons((unsigned short)atoi(service))) == 0)
            errexit("can't get \"%s\" service entry\n", service);

/* Map protocol name to protocol number */
    if ( (ppe = getprotobyname(transport)) == 0)
            errexit("can't get \"%s\" protocol entry\n", transport);

/* Use protocol to choose a socket type */
    if (strcmp(transport, "udp") == 0)
            type = SOCK_DGRAM;
    else
            type = SOCK_STREAM;

/* Allocate a socket */
    s = socket(PF_INET, type, ppe->p_proto);
    if (s < 0)
            errexit("can't create socket: %s\n", strerror(s));

/* Bind the socket */
    if (errno=bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0)
            errexit("can't bind to %s port: %s\n", service,
                    strerror(errno));
    if (type == SOCK_STREAM && listen(s, qlen) < 0)
            errexit("can't listen on %s port: %s\n", service,
                    strerror(type));
    return s;
}

int     passiveTCP(const char *service, int qlen)
{
    return passivesock(service, "tcp", qlen);
}




#define QLEN              32    /* maximum connection queue length      */
#define BUFSIZE         4096


int     TCPechod(int fd);

int main(int argc, char *argv[])
{
    char    *service;      /* service name or port number  */
    struct  sockaddr_in fsin;       /* the address of a client      */
    unsigned int    alen;           /* length of client's address   */
    int     msock;                  /* master server socket         */
    int     ssock;                  /* slave server socket          */

    if (argc !=2)
            errexit("usage: %s port\n", argv[0]);

    service = argv[1];

    msock = passiveTCP(service, QLEN);

    (void) signal(SIGCHLD, reaper);

    while (1) {
            alen = sizeof(fsin);
            ssock = accept(msock, (struct sockaddr *)&fsin, &alen);
            if (ssock < 0) {
                    if (errno == EINTR)
                            continue;
                    errexit("accept: %s\n", strerror(ssock));
            }
            printf("Accept connection %d from %s:%d\n", ssock, inet_ntoa(fsin.sin_addr), (int)ntohs(fsin.sin_port));
            switch (fork()){
            case 0:
                (void) close(msock);
                TCPechod(ssock);
                close(ssock);
                exit(0);
            default:
                close(ssock);
                break;
            case -1:
                errexit("fork: %s\n", strerror(errno));
            }              
    }
}


int     TCPechod(int fd)
{
    char    buf[BUFSIZE];
    int     cc;

    while (cc = read(fd, buf, sizeof(buf))) {
            if (cc < 0)
                    errexit("echo read: %s\n", strerror(cc));
            if (errno=write(fd, buf, cc) < 0)
                    errexit("echo write: %s\n", strerror(errno));
    }
    return 0;
}

Any heads up would greatly be appreciated.
I thank you in advance.

Was it helpful?

Solution

The problem is how you are calling waitpid, because you are only leave the while when an error occurred (waitpid return < 0 if an error occurred). When you call waitpid with the WNOHANG flag, it will returns 0 if there is not any child process terminated (really change state: stopped, resumed or terminated). Try this correction:

void reaper(int sig)
{
  int status;
  pid_t pid;
  while ((pid = waitpid(-1, &status, WNOHANG)) > 0)
    printf("Proces PID: %d Hash Finished With Status: %d", pid, status);
  if (0 == pid) printf("No More Process Waiting");
  if (pid < 0) printf("An Error Ocurred");
}

If you want to use wait the reaper function must be something like:

void reaper(int sig)
{
  int status;
  pid_t pid;
  pid = wait(&status); // Wait suspend the execution of the current process.
  if (pid > 0) printf("Proces PID: %d Hash Finished With Status: %d", pid, status);
  if (pid < 0) printf("An Error Ocurred");
}

For additional information about wait(2) go to: http://linux.die.net/man/2/wait

OTHER TIPS

I ran into this problem as well.

The "reaper" function with the >=0 test is in examples all over the place, but this may end up being an endless loop since even once it cleans up the child it'll keep looping not until there are no more, but until it gets an error of some kind.

There are Perl versions of this code out there that is usually "fixed" by using >0 instead of >=0, but you may want to use the logic as shown here where you explicitly test for the cases of interest.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top