Getting disconnection notification using TCP Keep-Alive on write blocked socket

Question 1

If you pull the network connection before all the data is transmitted, then the connection is not idle and thus in some implementations the keepalive timer does not start. (Keep in mind that keepalive is NOT part of the TCP specification and as a result it is implemented inconsistently if at all.) In general, because of the combination of exponential backoff and large number of retries (tcp_retries2 defaults to 15) it can take up to 30 minutes for transmission retries to time out before the keepalive timer starts.

The workaround, if there is one, depends on the particular TCP implementation you are using. Some newer versions of Linux (kernel version 2.6.37 released 4 January, 2011) implement TCP_USER_TIMEOUT. More info here.

The usual recommendation is to implement communication timeouts at the application level rather than use TCP-based keepalive anyway. See, for example, HTTP Keep-Alive.

Question 2

A few points I'd like to touch on....

1) According to this document, here is what's necessary for using keepalive in Linux:

Linux has built-in support for keepalive. You need to enable TCP/IP networking in order to use it. You also need procfs support and sysctl support to be able to configure the kernel parameters at runtime.

The procedures involving keepalive use three user-driven variables:
tcp_keepalive_time 
> the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further
tcp_keepalive_intvl 
> the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
tcp_keepalive_probes 
> the number of unacknowledged probes to send before considering the connection dead and notifying the application layer

Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.

Try to look at the current values for these variables in your current system to make sure they are correct or make sense. The bold highlight is mine and it seems you are doing that.

I assume the values for those variables are in milliseconds but not sure, you double-check.

tcp_keepalive_time

I would expect a value meaning something around 'ASAP after last data packet sent, send the first probe'

tcp_keepalive_intvl

I guess the value for this variable should be something lesser than the default time TCP takes to shut down the connection.

tcp_keepalive_probes

This could be the "magic value" that makes or breaks your application; if the number of unacknowledged probes is too high it could be the cause of epoll_wait() never exititing.

The document discusses the Linux implementation of TCP keepalive in Linux kernel releases (2.4.x, 2.6.x) as well as how to write TCP keepalive-enabled applications in the C language.

http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/

2) Make sure you are not specifying -1 in the timeout argument in epoll_wait() because it causes epoll_wait() to block indefinitely.

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

The timeout argument specifies the minimum number of milliseconds that epoll_wait() will block. (This interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking interval may overrun by a small amount.) Specifying a timeout of -1 causes epoll_wait() to block indefinitely, while specifying a timeout equal to zero cause epoll_wait() to return immediately, even if no events are available.

From the manual page http://linux.die.net/man/2/epoll_wait

Question 3

Even though you already set keepalive option to your application socket, you can't detect in time the dead connection state of the socket, in case of your app keeps writing on the socket. That's because of tcp retransmission by the kernel tcp stack. tcp_retries1 and tcp_retries2 are kernel parameters for configuring tcp retransmission timeout. It's hard to predict precise time of retransmission timeout because it's calculated by RTT mechanism. You can see this computation in rfc793. (3.7. Data Communication)

https://www.rfc-editor.org/rfc/rfc793.txt

Each platforms have kernel configurations for tcp retransmission.

Linux : tcp_retries1, tcp_retries2 : (exist in /proc/sys/net/ipv4)

http://linux.die.net/man/7/tcp

HPUX : tcp_ip_notify_interval, tcp_ip_abort_interval

http://www.hpuxtips.es/?q=node/53

AIX : rto_low, rto_high, rto_length, rto_limit

http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf

You should set lower value for tcp_retries2 (default 15) if you want to early detect dead connection, but it's not precise time as I already said. In addition, currently you can't set those values only for single socket. Those are global kernel parameters. There was some trial to apply tcp retransmission socket option for single socket(http://patchwork.ozlabs.org/patch/55236/), but I don't think it was applied into kernel mainline. I can't find those options definition in system header files.

For reference, you can monitor your keepalive socket option through 'netstat --timers' like below. https://stackoverflow.com/questions/34914278

netstat -c --timer | grep "192.0.0.1:43245             192.0.68.1:49742"

tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (1.92/0/0)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (0.71/0/0)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (9.46/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (8.30/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (7.14/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (5.98/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (4.82/0/1)

In addition, when keepalive timeout ocurrs, you can meet different return events depending on platforms you use, so you must not decide dead connection status only by return events. For example, HP returns POLLERR event and AIX returns just POLLIN event when keepalive timeout occurs. You will meet ETIMEDOUT error in recv() call at that time.

In recent kernel version(since 2.6.37), you can use TCP_USER_TIMEOUT option will work well. This option can be used for single socket.