Why don't programming languages automatically manage the synchronous/asynchronous problem?

https://softwareengineering.stackexchange.com/questions/389445

22-02-2021
|

Question

I have not found many resources about this: I was wondering if it's possible/a good idea to be able to write asynchronous code in a synchronous way.

For example, here is some JavaScript code which retrieves the number of users stored in a database (an asynchronous operation):

getNbOfUsers(function (nbOfUsers) { console.log(nbOfUsers) });

It would be nice to be able to write something like this:

const nbOfUsers = getNbOfUsers();
console.log(getNbOfUsers);

And so the compiler would automatically take care of waiting for the response and then execute console.log. It will always wait for the asynchronous operations to complete before the results have to be used anywhere else. We would make so much less use of callbacks promises, async/await or whatever, and would never have to worry whether the result of an operation is available immediately or not.

Errors would still be manageable (did nbOfUsers get an integer or an error?) using try/catch, or something like optionals like in the Swift language.

Is it possible? It may be a terrible idea/a utopia... I don't know.

Solution

Async/await is exactly that automated management that you propose, albeit with two extra keywords. Why are they important? Aside from backwards compatibility?

Without explicit points where a coroutine may be suspended and resumed, we would need a type system to detect where an awaitable value must be awaited. Many programming languages do not have such a type system.
By making awaiting a value explicit, we can also pass awaitable values around as first class objects: promises. This can be super useful when writing higher-order code.
Async code has very deep effects for the execution model of a language, similar to the absence or presence of exceptions in the language. In particular, an async function can only be awaited by async functions. This affects all calling functions! But what if we change a function from non-async to async at the end of this dependency chain? This would be a backwards-incompatible change … unless all functions are async and every function call is awaited by default.

And that is highly undesirable because it has very bad performance implications. You wouldn't be able to simply return cheap values. Every function call would become a lot more expensive.

Async is great, but some kind of implicit async won't work in reality.

Pure functional languages like Haskell have a bit of an escape hatch because execution order is largely unspecified and unobservable. Or phrased differently: any specific order of operations must be explicitly encoded. That can be rather cumbersome for real-world programs, especially those I/O-heavy programs for which async code is a very good fit.

OTHER TIPS

What you are missing, is the purpose of async operations: They allow you to make use of your waiting time!

If you turn an async operation, like requesting some resource from a server, into a synchronous operation by implicitly and immediately waiting for the reply, your thread cannot do anything else with the waiting time. If the server takes 10 milliseconds to respond, there go about 30 million CPU cycles to the waste. The latency of the response becomes the execution time for the request.

The only reason why programmers invented async operations, is to hide the latency of inherently long-running tasks behind other useful computations. If you can fill the waiting time with useful work, that's CPU time saved. If you can't, well, nothing's lost by the operation being async.

So, I recommend to embrace the async operations that your languages provide to you. They are there to save you time.

Some do.

They're not mainstream (yet) because async is a relatively new feature that we've only just now gotten a good feel for if it's even a good feature, or how to present it to programmers in a way that is friendly/usable/expressive/etc. Existing async features are largely bolted onto existing languages, which require a little different design approach.

That said, it's not clearly a good idea to do everywhere. A common failing is doing async calls in a loop, effectively serializing their execution. Having asynchronous calls be implicit may obscure that sort of error. Also, if you support implicit coercion from a Task<T> (or your language's equivalent) to T, that can add a bit of complexity/cost to your typechecker and error reporting when it's unclear which of the two the programmer really wanted.

But those are not insurmountable problems. If you wanted to support that behavior you almost certainly could, though there would be trade-offs.

There are languages that do this. But, there is actually not much of a need, since it can be easily accomplished with existing language features.

As long as you have some way of expressing asynchrony, you can implement Futures or Promises purely as a library feature, you don't need any special language features. And as long as you have some of expressing Transparent Proxies, you can put the two features together and you have Transparent Futures.

For example, in Smalltalk and its descendants, an object can change its identity, it can literally "become" a different object (and in fact the method that does this is called Object>>become:).

Imagine a long-running computation that returns a Future<Int>. This Future<Int> has all the same methods as Int, except with different implementations. Future<Int>'s + method does not add another number and return the result, it returns a new Future<Int> which wraps the computation. And so on, and so forth. Methods that cannot sensibly be implemented by returning a Future<Int>, will instead automatically await the result, and then call self become: result., which will make the currently executing object (self, i.e. the Future<Int>) literally become the result object, i.e. from now on the object reference that used to be a Future<Int> is now an Int everywhere, completely transparent to the client.

No special asynchrony-related language features needed.

They do (well, most of them). The feature you're looking for is called threads.

Threads have their own problems however:

Because the code can be suspended at any point, you can't ever assume that things won't change "by themselves". When programming with threads, you waste a lot of time thinking about how your program should deal with things changing.

Imagine a game server is processing a player's attack on another player. Something like this:
```
if (playerInMeleeRange(attacker, victim)) {
    const damage = calculateAttackDamage(attacker, victim);
    if (victim.health <= damage) {

        // attacker gets whatever the victim was carrying as loot
        const loot = victim.getInventoryItems();
        attacker.addInventoryItems(loot);
        victim.removeInventoryItems(loot);

        victim.sendMessage("${attacker} hits you with a ${attacker.currentWeapon} and you die!");
        victim.setDead();
    } else {
        victim.health -= damage;
        victim.sendMessage("${attacker} hits you with a ${attacker.currentWeapon}!");
    }
    attacker.markAsKiller();
}
```
Three months later, a player discovers that by getting killed and logging off precisely when attacker.addInventoryItems is running, then victim.removeInventoryItems will fail, he can keep his items and the attacker also gets a copy of his items. He does this several times, creating a million tonnes of gold out of thin air and crashing the game's economy.

Alternatively, the attacker can log out while the game is sending a message to the victim, and he won't get a "murderer" tag above his head, so his next victim won't run away from him.

Because the code can be suspended at any point, you need to use locks everywhere when manipulating data structures. I gave an example above that has obvious consequences in a game, but it can be more subtle. Consider adding an item to the start of a linked list:

newItem.nextItem = list.firstItem;
list.firstItem = newItem;

This isn't a problem if you say that threads can only be suspended when they're doing I/O, and not at any point. But I'm sure you can imagine a situation where there's an I/O operation - such as logging:

for (player = playerList.firstItem; player != null; player = item.nextPlayer) {
    debugLog("${item.name} is online, they get a gold star");
    // Oops! The player might've logged out while the log message was being written to disk, and now this will throw an exception and the remaining players won't get their gold stars.
    // Or the list might've been rearranged and some players might get two and some players might get none.
    player.addInventoryItem(InventoryItems.GoldStar);
}

Because the code can be suspended at any point, there could potentially be a lot of state to save. The system deals with this by giving each thread an entirely separate stack. But the stack is quite big, so you can't have more than about 2000 threads in a 32-bit program. Or you could reduce the stack size, at the risk of making it too small.

A find lot of the answers here misleading, because while the question was literally asking about asynchronous programming and not non-blocking IO, I don't think we can discuss one without discussing the other in this particular case.

While asynchronous programming is inherently, well, asynchronous, the raison d'être of asynchronous programming is mostly to avoid blocking kernel threads. Node.js uses asynchronosity via callbacks or Promises to allow blocking operations to be dispatched from an event loop and Netty in Java uses asynchronisity via callbacks or CompletableFutures to do something similar.

Non-blocking code does not require asynchronosity, however. It depends how much your programming language and runtime is willing to do for you.

Go, Erlang, and Haskell/GHC can handle this for you. You can write something like var response = http.get('example.com/test') and have it release a kernel thread behind the scenes while waiting for a response. This is done by goroutines, Erlang processes, or forkIO letting go of kernel threads behind the scenes when blocking, allowing it to do other things while awaiting a response.

It's true that language can't really handle asynchronosity for you, but some abstractions let you go further than others e.g. undelimited continuations or asymmetric coroutines. However, the primary cause of asynchronous code, blocking system calls, absolutely can be abstracted away from the developer.

Node.js and Java support asynchronous non-blocking code, whereas Go and Erlang support synchronous non-blocking code. They're both valid approaches with different tradeoffs.

My rather subjective argument is that those arguing against runtimes managing non-blocking on behalf of the developer are like those arguing against garbage collection in the early noughties. Yes, it incurs a cost (in this case primarily more memory), but it makes development and debugging easier, and makes codebases more robust.

I'd personally argue that asynchronous non-blocking code should be reserved for systems programming in the future and more modern technology stacks should migrate to synchronous non-blocking runtimes for application development.

If I'm reading you right, you are asking for a synchronous programming model, but a high performance implementation. If that is correct then that is already available to us in the form of green threads or processes of e.g. Erlang or Haskell. So yes, it's an excellent idea, but the retrofitting to existing languages can't always be as smooth as you would like.

I appreciate the question, and find the majority of answers to be merely defensive of the status quo. In the spectrum of low- to high-level languages, we've been stuck in a rut for some time. The next higher level is clearly going to be a language that is less focused on syntax (the need for explicit keywords like await and async) and much more about intention. (Obvious credit to Charles Simonyi, but thinking of 2019 and the future.)

If I told a programmer, write some code that simply fetches a value from a database, you can safely assume I mean, "and BTW, don't hang the UI" and "don't introduce other considerations that mask hard to find bugs". Programmers of the future, with a next-generation of languages and tools, will certainly be able to write code that simply fetches a value in one line of code and goes from there.

The highest level language would be speaking English, and relying on competence of the task doer to know what you really want done. (Think the computer in Star Trek, or asking something of Alexa.) We're far from that, but inching closer, and my expectation is that the language/compiler could be more to generate robust, intentioned code without going so far as to needing AI.

On one hand, there are newer visual languages, like Scratch, that do this and aren't bogged down with all the syntactical technicalities. Certainly, there's a lot of behind-the-scenes work going on so the programmer doesn't have to worry about it. That said, I'm not writing business class software in Scratch, so, like you, I have the same expectation that it's time for mature programming languages to automatically manage the synchronous/asynchronous problem.

The problem you're describing is two-fold.

The program you're writing should behave asynchronously as a whole when viewed from the outside.
It should not be visible at the call site whether a function call potentially gives up control or not.

There are a couple of ways to achieve this, but they basically boil down to

having multiple threads (at some level of abstraction)
having multiple kinds of function at the language level, all of which are called like this foo(4, 7, bar, quux).

For (1), I'm lumping together forking and running multiple processes, spawning multiple kernel threads, and green thread implementations that schedule language-runtime level threads onto kernel threads. From the perspective of the problem, they are the same. In this world, no function ever gives up or loses control from the perspective of its thread. The thread itself sometimes doesn't have control and sometimes isn't running but you don't give up control of your own thread in this world. A system fitting this model may or may not have the ability to spawn new threads or join on existing threads. A system fitting this model may or may not have the ability to duplicate a thread like Unix's fork.

(2) is interesting. In order to do it justice we need to talk about introduction and elimination forms.

I'm going to show why implicit await cannot be added to a language like Javascript in a backwards-compatible way. The basic idea is that by exposing promises to the user and having a distinction between synchronous and asynchronous contexts, Javascript has leaked an implementation detail that prevents handling synchronous and asynchronous functions uniformly. There's also the fact that you can't await a promise outside of an async function body. These design choices are incompatible with "making asynchronousness invisible to the caller".

You can introduce a synchronous function using a lambda and eliminate it with a function call.

Synchronous function introduction:

((x) => {return x + x;})

Synchronous function elimination:

f(4)

((x) => {return x + x;})(4)

You can contrast this with asynchronous function introduction and elimination.

Asynchronous function introduction

(async (x) => {return x + x;})

Asynchonrous function elimination (note: only valid inside an async function)

await (async (x) => {return x + x;})(4)

The fundamental problem here is that an asynchronous function is also a synchronous function producing a promise object.

Here's an example of calling an asynchronous function synchronously in the node.js repl.

> (async (x) => {return x + x;})(4)
Promise { 8 }

You can hypothetically have a language, even a dynamically typed one, where the difference between asynchronous and synchronous function calls is not visible at the call site and possibly is not visible at the definition site.

Taking a language like that and lowering it to Javascript is possible, you'd just have to effectively make all functions asynchronous.

With Go language goroutines, and the Go language run time, you can write all code as if it was synchrone. If an operation blocks in one goroutine, execution continues in other goroutines. And with channels you can communicate easily between goroutines. This is often easier than callbacks like in Javascript or async/await in other languages. See for https://tour.golang.org/concurrency/1 for some examples and an explanation.

Furthermore, I have no personal experience with it, but I hear Erlang has similar facilities.

So, yes, there are programming languages Like Go and Erlang, which solve the syncronous/asynchronous problem, but unfortunately they are not very popular yet. As those languages grow in popularity, perhaps the facilities they provide will be implemented also in other languages.

There is a very important aspect that has not been raised yet: reentrancy. If you have any other code (ie.: event loop) that runs during the async call (and if you don't then why do you even need async?), then the code can affect the program state. You cannot hide the async calls from the caller because the caller may depend on parts of the program state to remain unaffected for the duration of his function call. Example:

function foo( obj ) {
    obj.x = 2;
    bar();
    log( "obj.x equals 2: " + obj.x );
}

If bar() is an async function then it may be possible for the obj.x to change during it's execution. This would be rather unexpected without any hint that bar is async and that effect is possible. The only alternative would be to suspect every possible function/method to be async and re-fetch and re-check any non-local state after each function call. This is prone to subtle bugs and may not even be possible at all if some of the non-local state is fetched via functions. Because of that, the programmer needs to be aware which of the functions have a potential of altering the program state in unexpected ways:

async function foo( obj ) {
    obj.x = 2;
    await bar();
    log( "obj.x equals 2: " + obj.x );
}

Now it is clearly visible that the bar() is an async function, and the correct way to handle it is to re-check the expected value of obj.x afterwards and deal with any changes that may have had occurred.

As already noted by other answers, pure functional languages like Haskell can escape that effect entirely by avoiding the need for any shared/global state at all. I do not have much experience with functional languages so I am probably biased against it, but I do not think lack of the global state is an advantage when writing larger applications though.

In the case of Javascript, which you used in your question, there is an important point to be aware of: Javascript is single-threaded, and the order of execution is guaranteed as long as there are no async calls.

So if you have a sequence like yours:

const nbOfUsers = getNbOfUsers();

You are guaranteed that nothing else will be executed in the meantime. No need for locks or anything similar.

However, if getNbOfUsers is asynchronous, then:

const nbOfUsers = await getNbOfUsers();

means that while getNbOfUsers runs, execution yields, and other code may run in between. This may in turn require some locking to happen, depending on what you are doing.

So, it's a good idea to be aware when a call is asynchronous and when it isn't, as in some situation you will need to take additional precautions you wouldn't need to if the call was synchronous.

There are languages that do what (I understand) you propose - you can add any amount of code, and when you later access the variable that was supposed to be filled asynchronously, the execution waits for it to finish.
Of course, if you access it right in the next line, you don't gain anything - the idea is to do other work first, and then use it when really needed, without worrying that it is ready in time.

I know that ABAP (the SAP language) can do that; but it is an interpreted language, so it is probably easier to handle. A compiled language would have to put more effort into it, but it should be possible.

However, hiding that in the language definition would make it harder to write optimally fast programs, as you no longer have control about what happens. C and C++ have the core concept to not add any overhead - you only pay for what you use - with the price that you need to handle synchronization yourself, where you want/need it.

This is available in C++ as std::async since C++11.

The template function async runs the function f asynchronously (potentially in a separate thread which may be part of a thread pool) and returns a std::future that will eventually hold the result of that function call.

And with C++20 coroutines can be used:

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange