SetFileCompletionNotificationModes() disrupts my event loop, and yours?

https://stackoverflow.com/questions/22682604

22-06-2023
|

Question

The new Windows API SetFileCompletionNotificationModes() with the flag FILE_SKIP_COMPLETION_PORT_ON_SUCCESS is very useful to optimize an I/O completion port loop, because you'll get less I/O completions for the same HANDLE. But it also disrupts the entire I/O completion port loop, becase you have to change a lot of things, so I thought it was better to open a new post about all of those things to change.

First of all, when you set the flag FILE_SKIP_COMPLETION_PORT_ON_SUCCESS it means that you won't receive I/O completions anymore for that HANDLE/SOCKET until all of the bytes are read (or written) so, until there is no more I/O to do, just like in unix when you got EWOULDBLOCK. When you'll receive ERROR_IO_PENDING again (so a new request will pending) it's just like getting EWOULDBLOCK in unix.

Said that, I encountered some difficulties to adapt this behavior to my iocp event loop, because a normal iocp event loop simply wait forever until there is some OVERLAPPED packet to process, the OVERLAPPED packet will be processed calling the correct callback, which in turn will decrement an atomic counter, and the loop starts to wait again, until the next packet will come.

Now, if I set FILE_SKIP_COMPLETION_PORT_ON_SUCCESS, when an OVERLAPPED packet is returned to be processed, I process it by doing some I/O (with ReadFile() or WSARecv() or whatever) and it can be pending again (if I get ERROR_IO_PENDING) or it cannot, if my I/O API completes immediately. In the former case I have just to wait the next pending OVERLAPPED, but in the latter case what I have to do?

If I try to do I/O until I get ERROR_IO_PENDING, it goes in an infinite loop, it will never return ERROR_IO_PENDING (until the HANDLE/SOCKET's counterpart stop reading/writing), so others OVERLAPPEDs will wait indefinitely. Since I am testing that with a local named pipe that writes or reads forever, it goes in an infinite loop.

So I thought to do I/O until a certain X amount of bytes, just like a scheduler assigns time slices, and if I get ERROR_IO_PENDING before X, that's ok, the OVERLAPPED will be queued again in the iocp event loop, but what about I didn't get ERROR_IO_PENDING?

I tried to put my OVERLAPPED that hasn't finished its I/O in a list/queue for later processing, calling I/O APIs later (always with max X amount of bytes), after processed others OVERLAPPEDs waiting, and I set GetQueuedCompletionStatus[Ex]() timeout to 0, so, basically the loop will process listed/queued OVERLAPPEDs that hasn't finished I/O and in the same time checking immediately for new OVERLAPPEDs without going to sleep.

When the list/queued of unfinished OVERLAPPEDs becomes empty, I can set GQCS[Ex] timeout to INFINITE again. And so on.

In theory it should work perfectly, but I have noticed a strange thing: GQCS[Ex] with timeout set to 0 returns the same OVERLAPPEDs that aren't still fully processed (so those are in the list/queue waiting for later processing) again and again.

Question 1: so if I got it right, the OVERLAPPED packet will be removed from the system only when all data is processed?

Let's say that is ok, because If I get the same OVERLAPPEDs again and again, I don't need to put them in the list/queue, but I process them only like other OVERLAPPEDs, and if I get ERROR_IO_PENDING, fine, otherwise I will process them again later.

But there is a flaw: when I call the callback for processing OVERLAPPEDs packets, I decrement the atomic counter of pending I/O operations. With FILE_SKIP_COMPLETION_PORT_ON_SUCCESS set, I don't know if the callback has been called to process a real pending operation, or just an OVERLAPPED waiting for more synchronous I/O.

Question 2: How I can get that information? I have to set more flags in the structure I derive from OVERLAPPED?

Basically I increment the atomic counter for pending operations before calling ReadFile() or WSARecv() or whatever, and when I see that it returned anything different from ERROR_IO_PENDING or success, I decrement it again. With FILE_SKIP_COMPLETION_PORT_ON_SUCCESS set, I have to decrement it again also when the I/O API completes with success, because it means I won't receive a completion.

It's a waste of time incrementing and decrementing an atomic counter when your I/O API will likely do an immediate and synchronous completion. Can't I simply increment the atomic counter of pending operations only when I receive ERROR_IO_PENDING? I didn't this before because I thought that if another thread that completes my pending I/O will be scheduled before the calling thread can check if the error is ERROR_IO_PENDING and so increment the atomic counter of pending operations, I'll get the atomic counter messed up.

Question 3: Is this a real concern? Or can I just skip that, and increment the atomic counter only when I get ERROR_IO_PENDING? It would simplify things very much.

Only a flag, and a lot of design to rethink. What are your thoughts?

Solution

As Remy states in the comments: Your understanding of what FILE_SKIP_COMPLETION_PORT_ON_SUCCESS does is wrong. ALL it does is allow you to process the completed operation 'in line' if the call that you made (say WSARecv() returns 0.

So, assuming you have a 'handleCompletion()' function that you would call once you retrieve the completion from the IOCP with GQCS(), you can simply call that function immediately after your successful WSARecv().

If you're using a per-operation counter to track when the final operation completes on a connection (and I do this for lifetime management of the per-connection data that I associate as a completion key) then you still do this in exactly the same way and nothing changes...

You can't increment ONLY on ERROR_IO_PENDING because then you have a race condition between the operation completing and the increment occurring. You ALWAYS have to increment before the API which could cause the decrement (in the handler) because otherwise thread scheduling can screw you up. I don't really see how skipping the increment would "simplify" things at all...

Nothing else changes. Except...

Of course you could now have recursive calls into your completion handler (depending on your design) and this was something which was not possible before. For example; You can now have a WSARecv() call complete with a return code of 0 (because there is data available) and your completion handling code can issue another WSARecv() which could also complete with a return code of 0 and then your completion handling code would be called again possibly recursively (depending on the design of your code).
Individual busy connections can now prevent other connections for getting any processing time. If you have 3 concurrent connections and all of the peers are sending data as fast as they can and this is faster than your server can process the data and you have, for example, 2 I/O threads calling GQCS() then with FILE_SKIP_COMPLETION_PORT_ON_SUCCESS you may find that two of these connections monopolise the I/O threads (all WSARecv() calls return success which results in inline processing of all inbound data). In this situation I tend to have a counter of "max consecutive I/O operations per connection" and once that counter reaches a configurable limit I post the next inline completion to the IOCP and let it be retrieved by a call to GQCS() as this allows other connections a chance.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow