Just handle the communication asynchronously, on a single thread. This should allow up to ~10k connections per second. Just don't perform anything slow on this thread. Just push onto the queue and yield to the communication service.
Now, start as many threads as can usefully do the CPU intensive work (usually #of logical core, but sometimes #physical cores and certainly if you are saturating the communication throughput (unlikely), may (#cores - 1)).
If you anticipate that the IO side will be saturated and you cannot afford to block even on the mutex, use a lockfree queue. In that case, definitely dimension (#cores -1) workers, because the workers would naturally spin in a tight loop waiting for messages on the queue, suffocating the IO thread if you don't take precautions.