The new Windows API SetFileCompletionNotificationModes()
with the flag FILE_SKIP_COMPLETION_PORT_ON_SUCCESS is very useful to optimize an I/O completion port loop, because you'll get less I/O completions for the same HANDLE.
But it also disrupts the entire I/O completion port loop, becase you have to change a lot of things, so I thought it was better to open a new post about all of those things to change.
First of all, when you set the flag FILE_SKIP_COMPLETION_PORT_ON_SUCCESS it means that you won't receive I/O completions anymore for that HANDLE/SOCKET until all of the bytes are read (or written) so, until there is no more I/O to do, just like in unix when you got EWOULDBLOCK.
When you'll receive ERROR_IO_PENDING again (so a new request will pending) it's just like getting EWOULDBLOCK in unix.
Said that, I encountered some difficulties to adapt this behavior to my iocp event loop, because a normal iocp event loop simply wait forever until there is some OVERLAPPED packet to process, the OVERLAPPED packet will be processed calling the correct callback, which in turn will decrement an atomic counter, and the loop starts to wait again, until the next packet will come.
Now, if I set FILE_SKIP_COMPLETION_PORT_ON_SUCCESS, when an OVERLAPPED packet is returned to be processed, I process it by doing some I/O (with ReadFile()
or WSARecv()
or whatever) and it can be pending again (if I get ERROR_IO_PENDING) or it cannot, if my I/O API completes immediately. In the former case I have just to wait the next pending OVERLAPPED, but in the latter case what I have to do?
If I try to do I/O until I get ERROR_IO_PENDING, it goes in an infinite loop, it will never return ERROR_IO_PENDING (until the HANDLE/SOCKET's counterpart stop reading/writing), so others OVERLAPPEDs will wait indefinitely. Since I am testing that with a local named pipe that writes or reads forever, it goes in an infinite loop.
So I thought to do I/O until a certain X amount of bytes, just like a scheduler assigns time slices, and if I get ERROR_IO_PENDING before X, that's ok, the OVERLAPPED will be queued again in the iocp event loop, but what about I didn't get ERROR_IO_PENDING?
I tried to put my OVERLAPPED that hasn't finished its I/O in a list/queue for later processing, calling I/O APIs later (always with max X amount of bytes), after processed others OVERLAPPEDs waiting, and I set GetQueuedCompletionStatus[Ex]()
timeout to 0, so, basically the loop will process listed/queued OVERLAPPEDs that hasn't finished I/O and in the same time checking immediately for new OVERLAPPEDs without going to sleep.
When the list/queued of unfinished OVERLAPPEDs becomes empty, I can set GQCS[Ex] timeout to INFINITE again. And so on.
In theory it should work perfectly, but I have noticed a strange thing: GQCS[Ex] with timeout set to 0 returns the same OVERLAPPEDs that aren't still fully processed (so those are in the list/queue waiting for later processing) again and again.
Question 1: so if I got it right, the OVERLAPPED packet will be removed from the system only when all data is processed?
Let's say that is ok, because If I get the same OVERLAPPEDs again and again, I don't need to put them in the list/queue, but I process them only like other OVERLAPPEDs, and if I get ERROR_IO_PENDING, fine, otherwise I will process them again later.
But there is a flaw: when I call the callback for processing OVERLAPPEDs packets, I decrement the atomic counter of pending I/O operations. With FILE_SKIP_COMPLETION_PORT_ON_SUCCESS set, I don't know if the callback has been called to process a real pending operation, or just an OVERLAPPED waiting for more synchronous I/O.
Question 2: How I can get that information? I have to set more flags in the structure I derive from OVERLAPPED?
Basically I increment the atomic counter for pending operations before calling ReadFile()
or WSARecv()
or whatever, and when I see that it returned anything different from ERROR_IO_PENDING or success, I decrement it again.
With FILE_SKIP_COMPLETION_PORT_ON_SUCCESS set, I have to decrement it again also when the I/O API completes with success, because it means I won't receive a completion.
It's a waste of time incrementing and decrementing an atomic counter when your I/O API will likely do an immediate and synchronous completion. Can't I simply increment the atomic counter of pending operations only when I receive ERROR_IO_PENDING? I didn't this before because I thought that if another thread that completes my pending I/O will be scheduled before the calling thread can check if the error is ERROR_IO_PENDING and so increment the atomic counter of pending operations, I'll get the atomic counter messed up.
Question 3: Is this a real concern? Or can I just skip that, and increment the atomic counter only when I get ERROR_IO_PENDING? It would simplify things very much.
Only a flag, and a lot of design to rethink.
What are your thoughts?