Question

I'm using I/O Completion Ports in Windows, I have an object called 'Stream' that resembles and abstract an HANDLE (so it can be a socket, a file, and so on).

When I call Stream::read() or Stream::write() (so, ReadFile()/WriteFile() in the case of files, and WSARecv()/WSASend() in the case of sockets), I allocate a new OVERLAPPED structure in order to make a pending I/O request that will be completed in the IOCP loop by some other thread.

Then, when the OVERLAPPED structure will be completed by the IOCP loop, it will be destroyed there. If that's the case, Stream::read() or Stream::write() are called again from the IOCP loop, they will instance new OVERLAPPED structures, and it will go forever.

This works just fine. But now I want to improve this by adding caching of OVERLAPPED objects: when my Stream object does a lot of reads or writes, it absolutely makes sense to cache the OVERLAPPED structures.

But now arise a problem: when I deallocate a Stream object, I must deallocate the cached OVERLAPPED structures, but how I can know if they've been completed or are still pending and one of the IOCP loops will complete that lately? So, an atomic reference count is needed here, but now the problem is that if I use an atomic ref counter, I have to increase that ref counter for each read or write operations, and decrease on each IOCP loop completion of the OVERLAPPED structure or Stream deletion, which in a server are a lot of operations, so I'll end up by increasing/decreasing a lot of atomic counters a lot of times.

Will this impact very negatively the concurrency of multiple threads? This is my only concern that blocks me to put this atomic reference counter for each OVERLAPPED structure.

Are my concerns baseless?

I thought this is an important topic to point out, and a question on SO, to see other's people thoughts on this and methods for caching OVERLAPPED structures with IOCP, is worth it. I wish to find out a clever solution on this, without using atomic ref counters, if it is possible.

Was it helpful?

Solution

Assuming that you bundle a data buffer with the OVERLAPPED structure as a 'per operation' data object then pooling them to avoid excessive allocation/deallocation and heap fragmentation is a good idea.

If you only ever use this object for I/O operations then there's no need for a ref count, simply pull one from the pool, do your WSASend/WSARecv with it and then release it to the pool once you're done with it in the IOCP completion handler.

If, however, you want to get a bit more complicated and allow these buffers to be passed out to other code then you may want to consider ref counting them if that makes it easier. I do this in my current framework and it allows me to have generic code for the networking side of things and then pass data buffers from read completions out to customer code and they can do what they want with them and when they're done they release them back to the pool. This currently uses a ref count but I'm moving away from that as a minor performance tweak. The ref count is still there but in most situations it only ever goes from 0 -> 1 and then to 0 again, rather than being manipulated at various layers within my framework (this is done by passing the ownership of the buffer out to the user code using a smart pointer).

In most situations I expect that a ref count is unlikely to be your most expensive operation (even on NUMA hardware in situations where your buffers are being used from multiple nodes). More likely the locking involved in putting these things back into a pool will be your bottleneck; I've solved that one so am moving on to the next higher fruit ;)

You also talk about your 'per connection' object and caching your 'per operation' data locally there (which is what I do before pushing them back to the allocator), whilst ref counts aren't strictly required for the 'per operation' data, the 'per connection' data needs, at least, an atomically modifiable 'num operations in progress' count so that you can tell when you can free IT up. Again, due to my framework design, this has become a normal ref count for which customer code can hold refs as well as active I/O operations. I've yet to work a way around the need for this counter in a general purpose framework.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top