Thread per connection vs Reactor pattern (with a thread pool)?

Question 1

Reactive Applications certainly scale better, when they are written correctly. This means

Never blocking in a reactive thread:
- Any blocking will seriously degrade the performance of you server, you typically use a small number of reactive threads, so blocking can also quickly cause deadlock.
- No mutexs since these can block, so no shared mutable state. If you require shared state you will have to wrap it with an actor or similar so only one thread has access to the state.
All work in the reactive threads should be cpu bound
- All IO has to be asynchronous or be performed in a different thread pool and the results feed back into the reactor.
- This means using either futures or callbacks to process replies, this style of code can quickly become unmaintainable if you are not used to it and disciplined.
All work in the reactive threads should be small
- To maintain responsiveness of the server all tasks in the reactor must be small (bounded by time)
- On an 8 core machine you cannot cannot allow 8 long tasks arrive at the same time because no other work will start until they are complete
- If a tasks could take a long time it must be broken up (cooperative multitasking)

Tasks in reactive applications are scheduled by the application not the operating system, that is why they can be faster and use less memory. When you write a Reactive application you are saying that you know the problem domain so well that you can organise and schedule this type of work better than the operating system can schedule threads doing the same work in a blocking fashion.

I am a big fan of reactive architectures but they come with costs. I am not sure I would write my first c++ application as reactive, I normally try to learn one thing at a time.

If you decide to use a reactive architecture use a good framework that will help you design and structure your code or you will end up with spaghetti. Things to look for are:

What is the unit of work?
How easy is it to add new work? can it only come in from an external event (eg network request)
How easy is it to break work up into smaller chunks?
How easy is it to process the results of this work?
How easy is it to move blocking code to another thread pool and still process the results?

I cannot recommend a C++ library for this, I now do my server development in Scala and Akka which provide all of this with an excellent composable futures library to keep the code clean.

Best of luck learning C++ and with which ever choice you make.

Question 2

Option 2 will most efficiently occupy your hardware. Here is the classic article, ten years old but still good.

http://www.kegel.com/c10k.html

The best library combination these days for structuring an application with concurrency and asynchronous waiting is Boost Thread plus Boost ASIO. You could also try a C++11 std thread library, and std mutex (but Boost ASIO is better than mutexes in a lot of cases, just always callback to the same thread and you don't need protected regions). Stay away from std future, cause it's broken:

http://bartoszmilewski.com/2009/03/03/broken-promises-c0x-futures/

The optimal number of threads in the thread pool is one thread per CPU core. 8 cores -> 8 threads. Plus maybe a few extra, if you think it's possible that your threadpool threads might call blocking operations sometimes.

Question 3

FWIW, Poco supports option 2 (ParallelReactor) since version 1.5.1

Question 4

I think that option 2 is the best one. As for tuning of the pool size, I think the pool should be adaptive. It should be able to spawn more threads (with some high hard limit) and remove excessive threads in times of low activity.

Question 5

as the analogy you linked to (and it's comments) suggest. this is somewhat application dependent. now what you are building here is a game server. let's analyze that.

game servers (generally) do a lot of I/O and relatively few calculations, so they are far from 100% CPU applications. on the other hand they also usually change values in some database (a "game world" model). all players create reads and writes to this database. which is exactly the intersection problem in the analogy.

so while you may gain some from handling the I/O in separate threads, you will also lose from having separate threads accessing the same database and waiting for its locks.

so either option 1 or 2 are acceptable in your situation. for scalability reasons I would not recommend option 3.