How do non-blocking HTTP servers work?

https://softwareengineering.stackexchange.com/questions/395395

28-02-2021
|

Question

My question is based on a few assumptions so please correct me on any of them below

I know that TCP has always been socket based
In order for a server to maintain that socket, a thread has to block and wait for IO on the server side
A modern web server has to handle at least hundreds if not thousands of requests per second
Even if the work is spawned off as separate threads with callbacks to the socket thread, this would mean thousands of threads running at a time on the server

Current software engineering, especially reactive frameworks, discourages running lots of threads, so I'm wondering whether modern HTTP servers still run thousands of threads or is there kind of a "reactive" way that they're handling that many clients at a time?

Solution

Think select()

In order for a server to maintain that socket, a thread has to block and wait for IO on the server side

That's not true. The operating system can, and does, manage the network connection behind the scenes, and exposes access to the connection via syncronous and asynchronous behaviors. Userspace programs, which include web-servers, never directly communicate with the network hardware. Instead, programs invoke system calls to request the OS kernel to do it on their behalf.

A syncronous interface makes sense for a lot of applications, as it is simple, easy to reason about and debug, and sufficient for many typical networked application. In that scenario, the program asks the kernel for an update about the connection, and asks the kernel to suspend the program until there is an update available.

Instead of doing that, the OS also provides a way to ask the kernel "has anything happened on my connection?" If you have multiple connections, the kernel will respond with one of your connections that has data on it, and you can act on that data. You can therefore write a loop that repeatedly asks the kernel "what have you got for me?" and act on each one in sequence, all in a single thread. More sophisticated systems replace busy-wait polling with a special function that will block on the collection of outstanding network activities, and resume when any of them have an update. See the select() function for an example of how such a system can be utilized.

Abstractions can be built on top of that "has anything happened" model, where the OS can signal the process that something has happened, and from there you can build async/await type abstractions and other models, all that run in a single thread of execution.

The operating system itself doesn't have a bunch of threads for each connection either. Using hardware interrupts, and other close-to-the-metal instructions, it reserves some time for it to manage organizing the networking abstractions, and it too manages each connection mostly one-at-a-time. After all, the electrical signals traveling down the wire are one at a time (major over-simplification).

Lastly, worth mentioning:

A modern web server has to handle at least hundreds if not thousands of requests per second

That's also not necessarily true. If you are writing facebook for cats, sure, however if I'm writing an online Pole-Barn ordering system, or something similar, as probably 99% of software developers are working on, I'm just not going to see that kind of traffic.

OTHER TIPS

It works asynchronously, which means a thread does not have to sacrifice itself to wait for socket IO.

The TCP socket is hard wired into the network driver, which in turn responds to network card interrupts. Basically, "modern" http servers are based on hardware intrerupts. As soon as an interrupt occurs, a thread is dispatched to handle it.

Drivers should never block, so every API which blocks actually blocks at a higher level in the framework, due to various design choices (like languages not supporting async interfaces or event based programming).

Your fourth point isn't correct. A webserver dosn't need thousand threads to handle the connection.

A normal Apache-server spawns for each connection a new thread. But these servers will get problems when many clients connecting to the server.

But a normal nginx-server works quite different. It uses the Producer-Consumer-Pattern. So, it has a thread pool and each thread blocks on a Queue (let us call them worker). On new connections a seperate thread put only the socket-informations into the queue. A worker will be invoked and take the object from queue and handle the request. After that, the worker try to get a new elements from the queue.

So nginx has always the same count of threads and never starts thousands of threads.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange