Why are threads so expensive, that event-driven non-blocking IO is better on benchmarks

https://stackoverflow.com/questions/13106075

14-07-2021
|

Frage

I recently started learning about node.js, a javascript library on top of V8 known for its non-blocking IO and incredible speed.

To my understanding, node does not wait for IO to respond, but runs an event loop (similar to a game loop) that keeps checking unfinished operations and continues/completes them as soon as IO responds. Node performance was compared to Apache HTTPD with node being significantly faster while using less memory.

Now if you read about Apache, you learn it uses 1 thread per user, which supposedly slows it down significantly and this is where my question appears:

If you compare threads to what node does internally in its event loop, you start seeing the similarities: Both are abstractions of an unfinished process that waits for a resource to respond, both check if the operation made progress regularly and then don't occupy the CPU for a certain amount of time (at least I think a good blocking API sleeps for a few milliseconds before rechecking).

Now where is that striking, critical difference that makes threads so much worse?

Lösung

The difference here is context switching. Swapping threads by the OS requires:

saving the instruction pointer (done by the CPU)
saving the CPU register (may not be neccessary if the thread has made a blocking call, but is neccessary if it is preempted)
swapping the call stacks. Even if the stacks reside in the same virtual memory space, this is at least one write and some reads and even applies to micro-threads(fibers).
in case of swapping to a different process, swapping to kernel mode, updating the virtual memory table and returning to user mode.

In the case of an event queue:

the state is updated. This needs to happen in any case.
the event handler returns. Instead of swapping the call stacks, the current call stack is popped.
the event queue is checked for pending requests. Only if there is no pending request, the application waits. This can be done by sleeping repeatedly (as suggested by the OP) or (better) by making a blocking call to the event queue. If the event queue (e.g. a set of TCP sockets) is managed by the OS, then the OS is responsible for notifying the application on a new event (a socket can accept more data).

If the server is highly loaded, the only overhead of an event queue is a handler return, reading the queue and a handler call. In the threaded approach, there is additional overhead from swapping the threads.

Also, as mentioned by PST, the threaded approach introduces the need for locking. Locking itself is cheap, but waiting for a resource to be released by some other thread requires additional context switching as the waiting thread cannot continue. It is even possible for a thread to be swapped in to acquire a lock only to be swapped out a few clock cycles later because it needs to lock another resource as well. Compare how much is done by the OS (reading the tread queue and swapping call stacks, at the very least) to how much is done by the thread (returning from a call and making another call).

Andere Tipps

From one aspect, it does depend on the implementation of the threading particular to that language. In general, however, it is the creation of a thread that is the costly portion, not the running of a thread. As such, some languages (like .Net) keep a thread pool of threads just laying around, so you can grab one that is essentially already created, keeping costs down.

The problem with threads is also, from what a professor told me, is that every language has the equivalent of a Thread.Yield() function, but no one actually makes use of it; so, every thread you will encounter is extremely aggressive in scheduling, which sets up all sorts of wars between mutexs and sempaphores; some threads, due to the level of aggression used, never actually run, which is a problem in of itself.

The benefit to threads is that they offload functionality from other loops, like the GUI loop, at the cost of increased functionality. Events, to the best of my knowledge, still run in a single thread (unless specifically told to do otherwise).

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow