Question

I'm wondering what common programming situations/bugs might cause a server process I have enter into CLOSE_WAIT but not actually close the socket.

What I'm wanting to do is trigger this situation so that I can fix it. In a normal development environment I've not been able to trigger it, but the same code used on a live server is occasionally getting them so that after many many days we have hundreds of them.

Googling for close_wait and it actually seems to be a very common problem, even in mature and supposedly well written services like nginx.

Was it helpful?

Solution

CLOSE_WAIT is basically when the remote end shut down the socket but the local application has not yet invoked a close() on it. This is usually happens when you are not expecting to read data from the socket and thus aren't watching it for readability.

Many applications for convenience sake will always monitor a socket for readability to detect a close.

A scenario to try out is this:

  1. Peer sends 2k of data and immediately closes the data
  2. Your socket is then registered with epoll and gets a notification for readability
  3. Your application only reads 1k of data
  4. You stop monitoring the socket for readability
  5. (I'm not sure if edge-triggered epoll will end up delivering the shutdown event as a separate event).

See also:

(from man epoll_ctl)

EPOLLRDHUP (since Linux 2.6.17) Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top