Since I'm a paying Apigee customer, I also opened a case....
originally, they weren't sure there was a keep alive function or a connection TTL that would force a drop/re-establish a connection
Here's what I got back:
- Set the following on your Router/Message Processor nodes: /proc/sys/net/ipv4/tcp_keepalive_time to 1800 second (30 minutes).
To do this: echo 1800 > /proc/sys/net/ipv4/tcp_keepalive_time
BE AWARE: This change is not persisted across reboots so you would like to edit your /etc/sysctl.conf file and put this in there.
Then do the command :
sysctl -p
to make those values load from that file.
You can use the following to check if the value got updated
sysctl net.ipv4.tcp_keepalive_time
- restart your Message Processors.
So the fix that has been put in place was a keep alive probe in the Hector client in the message processor.
The probe does a keep alive ping based on the interval set in the tcp_keepalive_time OS setting. So, the reasoning to set this to 30 minutes is based on your firewall setting for the idle time out being 3600 seconds.
The keep alive probes need to happen faster than the firewall's idle timeout so that it keeps the connection in the established state.