WSAsend to all connected socket in multithreaded iocp server

Question 1

As Martin says, this will perform terribly and likely kill the performance of anything that uses the list of sockets that you lock for the entire duration of your send to all connections. You don't say if this is UDP or TCP but if it's TCP be aware that you are now handing control of your server's performance over to the clients as TCP flow control on a slow client connection may cause the write completion to be delayed (see here)- and I assume you're using the write completion to trigger the event?

I assume that your actual requirement is that you want to avoid copying the data on the server and allocating multiple buffers, one for each connection either due to memory constraints or because you've profiled the memory copy and found that it's expensive.

The way I deal with this is to have a single reference counted buffer and a 'buffer handle' which is just a slightly extended overlapped structure which references your single data buffer and provides the WSABUF that you need. You can then issue a 'fire and forget' write to each connection using a unique 'buffer handle' all of which refer to the single underlying buffer. Once all writes complete the ref count on the buffer is reduced to zero and it cleans up - and as Martin says, that clean up is best achieved by putting the buffer into a pool for later reuse.

Note: I'm not sure that I actually understand what you are trying to do (so I orginally deleted my answer), let me know if I'm not following and I'll adjust...

Question 2

Not sure I understand some of that. IOCP typically does not use hEvent field in the OVL struct. I/O completion is signaled by queueing a completion message to the 'completion port', (ie. a queue). You seem to be using the hEvent field for some 'unusual' extra signaling to manage a single send data buffer and OVL block.

Obviously, I don't have the whole story from your post, but it looks to me that you are making heavy work for yourself on the tx side and serialising the sends will strangle performance:)

Do you HAVE to use the same OVL/buffer object for succcessive sends? What I usually do is use a different OVL/buffer for each send and just queue it up immediately. The kernel will send the buffers in sequence and return a completion message for each one. There is no problem with multiple IOCP tx requests on a socket - that's what the OVL block is for - to link them together inside the kernel stack.

There is an issue with having multiple IOCP receive requests for a socket outstanding - it can happen that two pool threads get completion packets for the same socket at the same time and so possibly resulting in out-of-order processing. Fixing that issue 'properly' requires something like an incrementing sequence-number in each rx buffer/OVL object issued and a critical-section and buffer-list in each socket object to 'save up' out-of-order buffers until all the earlier ones have been processed. I have a suspicion that many IOCP servers just dodge this issue by only having one rx IOCP request in at a time, (probably at the expense of performance).

Getting through a lot of buffers in this way could be somewhat taxing if they are being continually constructed and destroyed, so I don't normally bother and just create a few thousand of them at startup and push them, (OK, pointers to them), onto a producer-consumer 'pool queue', popping them off when a tx or rx is required and pushing them back on again. In the case of tx, this would happen when a send completion message is picked up by one of the IOCP pool threads. In the case of rx, it would happen when a pool thread, (or some other thread that has had the object queued to it by a pool thread), has processed it and no longer needs it.

Ahh.. you want to send exactly the same content to the list of sockets - like a chat server type thingy.

OK. So how about one buffer and multiple OVL blocks? I have not tried it, but don't see why it would not work. In the single buffer object, keep an atomic reference count of how many overlapped send requests you have sent out in your 'send to all clients' loop. When you get the buffers back in the completion packets, decrement the refCount towards zero and delete/repool the buffer when you get down to 0.

I think that should work, (?).