Benchmark of asynchronous code

https://softwareengineering.stackexchange.com/questions/208469

29-09-2020
|

Pergunta

Asynchronous programming seems to be getting quite popular these days. One of the most quoted advantages is performance gain from removing operations that block threads. But I also saw people saying that the advantage is not that great. And that properly configured thread pool can have same effect as fully asynchronous code.

My question is, are there any real benchmarks that compare blocking vs. asynchronous code? It can be any language, but I think C# and Java would be most representative. Not sure how good would this benchmark be with Node.js or similar.

Edit : My attempt at general question combined with unclear terminology seems to have failed. By "asynchronous code" I mean what some answers described as event or callback programming. In the final result, operations that would block thread are instead delegated to some callback system so threads can be utilized better.

And if I wanted to ask specific question : Are there any benchmarks that compare throughput/latency gain of async/await server code in .NET? Or any other similar comparison?

Solução

Based on your comments, it seems that you're really interested in "non-blocking IO." This differs from my definition of "asynchronous programming," which is an approach to decomposing work exemplified by Erlang processes or Goroutines.

And, if that is your definition, then yes, there have been benchmarks. But, like all benchmarks, they shouldn't be accepted blindly. Instead, you need to think about what goes on behind the scenes.

A thread is the unit of OS scheduling. Platforms such as Erlang and Go build their own schedulers on top of the OS scheduler, allowing multiple units of execution to share the same thread. This is great, as long as your units of execution are lightweight, because it avoids the overheads associated with threads.^* However, IO operations require a trip to the kernel, which means that you need a real thread to do them. And if you're implementing a sub-thread scheduler, you need to be smart about not scheduling sub-thread tasks on a thread that's blocked in a kernel operation.
All IO operations have the potential to block.^** When you make a read or write request, the kernel looks to see if there is data available or (for write) room in a buffer. If not, the kernel suspends the thread until the operation can complete. This makes thread-per-connection servers really simple to implement, but worries people who think about thread overheads.
Operating systems provide a way to block on multiple IO channels simultaneously. The select call on POSIX is one of these: you provide it with a list of channels (file descriptors / sockets) that you care about, and it will tell you when one of them is ready to read or write (read is what most people care about). You still have to make a kernel call, and you'll still end up blocking a thread if nothing's available, but that's only one thread. This is how Node.js works: when data is available, the proper event handler is called (I don't know the internals of Node, but hope that they also verify that write buffers are available before calling write).
When you max CPU, you're done. It doesn't matter whether you use select or a thread-per-connection approach, you still need to spend CPU to do whatever your server is meant to do. With the thread-per-connection approach, you don't really pay attention to that: the scheduler will assign threads to cores, and you'll degrade gracefully. With select, you will need to hand connections off to threads when they're ready for processing, or you'll be limited by the performance of a single core (Node.js gets around that by letting you spawn multiple servers).

As I said, benchmarks shouldn't be accepted blindly; they're only valid as long as they model the real-world problem that you're trying to solve. The author of the linked benchmark works for (worked for?) Mailinator, which if you haven't used it, is a poste restante service for ad hoc email addresses. Which means that it's going to be getting short-lived, high-activity connections from a relatively small number clients. This is a perfect use case for thread-per-connection scheduling. As noted in the comments, a chat server (long-lived, low activity) might be different.

In my mind, the question of blocking versus non-blocking IO is rather boring: most real-world servers don't have that many concurrent connections. More interesting to me is the programming model: a worker-based model like Erlang or Go means that you can focus on your business logic, and not care how connections are being managed.

^{* These overheads include kernel scheduling structures, and perhaps most important, a multi-megabyte thread stack, most of which goes unused. While 2MB doesn't seem like much, it adds up quickly if you have 100k processes ... which most applications don't have.}

^{** Not 100% true, but I don't want to get too deep into the weeds here.}

Outras dicas

All async programming(in node and other single threaded async implementations) does is hide latencies by being able to continue working while you wait for external resources, and being able to request multiple external resources at the same time so that their latencies overlap. There should be no difference between async and threads if the use of threads is limited to resource waiting and you don't overly tax the system with threads.

There are some async implementations that run the async tasks on a thread pool, for those implementations I wouldn't expect there to be any difference besides the difference in library overhead and library smartness in thread management.

http://www.ducons.com/blog/tests-and-thoughts-on-asynchronous-io-vs-multithreading

When people refer to the performance advantages of asynchronous code vs. blocking code, they are talking about within the context of a single thread. If you stop for 2 seconds and do nothing while a web page is being downloaded, that's obviously slower than continuing on doing other things while you wait.

Yes, thread pools can accomplish the same effect. The difference is that asynchronous code is generally easier to write correctly compared to multithreaded code, and asynchronous code doesn't usually need a thread backing it, so it's lighter on resources.

For example, waiting on 100 network calls only requires 100 socket handles, instead of 100 socket handles and 100 threads. This means benchmarks like latency for one web page download will be about the same, but asynchronous code will usually have advantages in things like number of simultaneous sockets. Which kind of benchmark you use depends on your use case.

To benchmark asynchronous code, you measure its performance in the usual way, and compare it to its non-synchronous equivalent.

Time for Synchronous call to return to caller: 900 ms
Time for asynchronous call to return to caller: 50 ms
Time saved: 850 ms.

Of course, you don't get that time savings for free; the resulting thread that is spun off from the asynchronous call must still spend 850 ms in the background completing its work, and if you spin up enough threads you will eventually run out of cores (on processor-bound tasks), so your mileage may vary.

To find out how much improvement you will get overall, you can run a load test.

Actually both approaches have their set of disadvantages. It's rather complicated subject going way beyond just threads vs events.

I recommend that you read excellent "Concurrent Programming for Scalable Web Architectures", either whole thing (which I recommend) or at least sections 4.2 Server Architectures and 4.3 Case of Threads vs. Events.

Regardless of how you define asynchronous programming, there's no global answer. Some programming problems are more efficiently solved with threads (or "asynchronous programming"), others will just be taking an unnecessary performance hit by spinning up a new thread (or sending a job off to a thread). When you have a particular problem with two solutions, you'll just need to benchmark the performance -- from start to finish -- of each prospective solution.

A few indicators that a job may benefit from a threaded/asynchronous solution:

There are multiple, potentially long-running calculations that can be performed independently of each other
There are "distant" data-fetches/puts (via HTTP, for instance) alongside other code that doesn't depend on the data
There are many medium-to-long data-fetches, interactions, or calculations that can be run independently of each other

Or more basically, if you reach a point in code wherein you need to doSomething() and you know that

doSomething() will take a "long" time, and ...
You can continue doing other urgent, important, and/or time-consuming things while you wait for doSomething(), then ...

... it's probably advantageous to run doSomething() asynchronously. (But, you still have to benchmark, even if "informally", to know for sure!) Whether you're explicitly aware that doSomething() has it's own thread or simply that you called doSomethingAsync() instead is up to you and/or your framework ...

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange