Question

We have a client and server application currently testing on the same Windows 7 64 bit machine. They're both written in C# and using P/Invoke to call out to the Winsock2 libraries.

The application works fine overall w/o any errors. And the latency for each "hop" over tcp/ip averages about 350 microseconds.

However, on occasion there are very long delays of upwards of 40 to 50ms before receiving packets and then suddenly they will all arrive.

Efforts to diagnose so far:

  1. During these delays receiving data, the server continues to log that it's sending packets. It's set to send test packets every 1 ms which it will do for 15 or 20 and as much as 50 ms sometimes before the client receives any of them.

  2. The tcpdump was used to sniff packets on the loopback adapter and shows that during this lag period, there's traffic from the server port (6488) to the client port (61743) as usual.

  3. The client calls the select() winsock2 call in a loop so logging via a counter prior to the select() call shows that it has the correct file descriptor. And of course this works before and after the delay just fine.

  4. Further logging immediately after the select() call shows that the fd isn't present--meaning that a read on the socket will block. However, during the periods of transmission w/o any delays, the logging shows it works as expected so that select() returns the fd of the socket to do a non-blocking read.

In short, the loopback adaptor seems to hold these packets somewhere for a long while before finally delivering them to the receiving side.

Any further ideas or a solution?

Some thoughts are the it's often claimed that overlapped I/O works better on Windows but that seems to only matter for scalability if you need to listen to more than 64 sockets.

Can it be that switching to overlapped will do the trick? We want to avoid as that will increase the project deadline and budget. This should work with select() just fine.

Also, can it be that the process or thread in Windows that handles the loopback gets context switched or something and, if so, is there a way to configure it to avoid those delays?

Edit: The correct answer was to ensure that the Nagle algorithm was disabled. We thought it was disabled but that's where the bug was found--in our in-house implementation of SetSocketOption() we used GetSocketOption() to verify. So it turns out you must set NoDelay prior to connecting or binding a socket or else it silently fails to have any effect.

Many thanks to Fun Mun Pieng for the correct answer!!!

Was it helpful?

Solution

I suspect this may be due to the Nagle algorithm. The following code disables it:

socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top