I would suggest you leave the Nagle alogithm and buffers turned on, as its basic purpose is to collect small writes into full/larger packets (this improves performance a lot), but at the same time use FlushFileBuffers() on the socket after your are done sending for a while.
I assume here, that your game has some sort of a main loop, which processes stuff and then waits for amount of time before going into the next round:
while(run_my_game)
{
process_game_events_and_send_data_over_network();
Sleep(20 - time_spent_processing);
};
I would now suggest to insert FlushFileBuffers() before the Sleep() call:
while(run_my_game)
{
process_game_events_and_send_data_over_network();
FlushFileBuffers(my_socket);
Sleep(20 - time_spent_processing);
};
That way, you delay sending pakets at latest to the moment before your application goes to sleep to wait for the next round. You should receive the performance benefit from Nagel's algorithm and minimize delay.
In case this doesn't work, it would be helpful if you post a bit of (pseudo-) code which explains how your program actually works.
EDIT:
There were two more thing that came into my head when I thought about your question again:
a) Delayed ACK pakets should indeed NOT cause any lag, as they travel in the opposite direction of the data you are sending. They block at worst the sending queue. This however will be solved by TCP after a few pakets when the bandwith of the connection and memory limits permit it. So unless you machine has really low RAM (not enough to hold a bigger send queue), or you are really trasmitting more data than your connection allows, then delayed ACK pakets are an optimisation and will actually improve performance.
b) You are using a dedicated thread for sending. I wonder why. AFAIK is the Socket API multi-threading safe, thus every producting thread could send the data all by itself - unless your application requires such a queue, I would suggest to also remove this dedicated sending thread and with it the additional synchronisation overhead and delay it might cause.
I' specifically mentioning the delay here. As the operating system might decide to not immediatly schedule the send-thread for executiong again, when it becomes unblocked on its queue. Typicall re-scheduling delays are in the 10ms range, but under load they can skyrock to 50ms or more. As a workarround, you could try fiddeling with the scheduling priorities. But this will not reduce the delay imposed by the operating system itself.
Btw. you can easily benchmark TCP and your network, by just having one thread on the client and one on the server, that just play ping/pong with some data.