Assuming that you bundle a data buffer with the OVERLAPPED
structure as a 'per operation' data object then pooling them to avoid excessive allocation/deallocation and heap fragmentation is a good idea.
If you only ever use this object for I/O operations then there's no need for a ref count, simply pull one from the pool, do your WSASend/WSARecv with it and then release it to the pool once you're done with it in the IOCP completion handler.
If, however, you want to get a bit more complicated and allow these buffers to be passed out to other code then you may want to consider ref counting them if that makes it easier. I do this in my current framework and it allows me to have generic code for the networking side of things and then pass data buffers from read completions out to customer code and they can do what they want with them and when they're done they release them back to the pool. This currently uses a ref count but I'm moving away from that as a minor performance tweak. The ref count is still there but in most situations it only ever goes from 0 -> 1 and then to 0 again, rather than being manipulated at various layers within my framework (this is done by passing the ownership of the buffer out to the user code using a smart pointer).
In most situations I expect that a ref count is unlikely to be your most expensive operation (even on NUMA hardware in situations where your buffers are being used from multiple nodes). More likely the locking involved in putting these things back into a pool will be your bottleneck; I've solved that one so am moving on to the next higher fruit ;)
You also talk about your 'per connection' object and caching your 'per operation' data locally there (which is what I do before pushing them back to the allocator), whilst ref counts aren't strictly required for the 'per operation' data, the 'per connection' data needs, at least, an atomically modifiable 'num operations in progress' count so that you can tell when you can free IT up. Again, due to my framework design, this has become a normal ref count for which customer code can hold refs as well as active I/O operations. I've yet to work a way around the need for this counter in a general purpose framework.