How do I know if my async function is truly asynchronous?

https://softwareengineering.stackexchange.com/questions/347789

11-01-2021
|

Question

I'm writing a function in the node.js-style asynchronous way, but how do I know if these functions are truly asynchronous, i.e., that they run in parallel and not sequentially?

Here is my code (modified to remove details):

// write the async function
function compileData(data, callback) {
  try {
    // ... do all the work ...
    callback(null, result); // instead of `return result;`, pass it to the callback
  } catch (err) {
    callback(err, null);
  }
}

// use the async function
compileData(mydata1, function (err, data) {
  if (err) console.error(err);
  else // do something with compiled `data`
});
compileData(mydata2, function (err, data) {
  if (err) console.error(err);
  else // do something with compiled `data`
});

When I use the async functions as above, how can I be sure they are running in parallel? How do I know the 2nd call of compileData() doesn’t wait to start until the first call is done?

La solution

The terms async and parallel are often interchanged, though async means that something out of band happens relative to the main flow of execution (thus interrupting the main flow of execution), often this interruption can be in service of an I/O completion routine. Thus, async implies (potentially arbitrarily timed) interruption of the main thread, and not necessarily with a full blown second thread. On some systems (but not JavaScript), the main thread is temporarily suspended and then high-jacked to service the I/O completion routine (then the suspended work is resumed).

Whereas parallel means two or more things (threads) happen simultaneously. On a uniprocessor there is no true parallel execution; however, it is simulated by interrupting whatever thread is currently running (and often at an arbitrary point) in order to give some CPU time to another thread. On a multiprocessor systems, there will be both simultaneous execution by different CPUs, as well as interruption for the purposes of interleaving thread execution to run more threads than the CPU has.

(If you will, async is a lightweight form of (pseudo) parallelism, where the one thing is the main thread, and the other thing isn't necessarily a true full blown thread, but rather an I/O completion routine.)

Let's also note that JavaScript is inherently single threaded. That is to say, there is no true parallel execution even on a multiprocessor chip, whereas in C, C++, Java or C#, you can have truely simultaneously executing threads on a multiprocessor chip. JavaScript has no notion of threads, but it does have callbacks.

Callbacks (function references) that you provide to some other routine may be synchronously executed or they may be asynchronously executed. It really depends on how those callbacks are used: you actually cannot tell from the code that passes the callback (without knowing the library routine to whom the callback is supplied). So callbacks don't immediately imply asynchronous execution.

(Callbacks are further complicated in that the function reference may actually be used zero or more times; there is basically no inherent guarantee from the semantics of the language as to when or how many times a callback is used.)

In your example, it would appear that the callback(s) are executed synchronously with respect to -- and under direct control of -- the main thread.

Asynchronous execution happens in JavaScript as a result of I/O (or timeout) completion routines. Since JavaScript is inherently single threaded, the way this works is that the (one) main thread is allowed to run to 100% completion -- to return to its caller (which is the JavaScript scheduler). At this point, there is no main thread running anymore, and, the JavaScript scheduler can invoke completion routines using that main thread. If multiple I/O requests trigger multiple completion routines, they are each run 100% to completion, one after another, in the order they are scheduled (which is the order of the I/O completion) without any interleaving.

(Of course, the I/O completion routine may itself make additional I/O calls, passing subsequent I/O completion callbacks; the overall execution of that set of completion routines would be interleaved with other such I/O completion callbacks as described above.)

In your case, the callbacks are executed by function reference invocation, which makes them run under regular control of the main thread. The mechanism is indirect function call, so your callbacks are invoked using the standard function invocation mechanism. The main thread does not run to completion; the callbacks are not scheduled to run but are rather directly invoked. To see the difference, you might substitute:

callback(null, result);

with

setTimeout(callback,100,null,result);

This would also invoke the callback, but it would wait until some time after the currently executing routine (and the whole thread, in fact) has returned to its caller.

Autres conseils

Asynchronous just means "In no particular order", as distinct from (ordinarily) synchronous code's "In exactly this order".

Parallel execution implies Asynchronous, as you don't specify which process|thread|task runs on which core|worker, and this means that to be safe the code should not depend on relative ordering, nor assume that shared values remain unchanged.

Imagine a scheduler for a single core machine, that maintains a queue of pending tasks. That scheduler will execute your program as if it were synchronous.

You're mixing up definitions, asynchronous means that the function returns right after it was called, without blocking the code flow... and you can get return value later. It's like boiling a water in a kettle, you put a water in, go away to do something else and the kettle signals you when the water is hot... that is asynchronous water boiling.

Paralell means that the code can be run simultaneously on more than one CPU core. So you put water in 2 or more kettles, turn them on at once, and that's paralell.

Now if you have 2 kettles AND you can leave and do something else untill the water is boiling it's paralell and asynchronous, if you need to keep your hands on the kettles it's paralell and synchronous.

So getting to nodejs, the code is NEVER paralell, because it's single threaded. You need to look at some other language, eg. golang if you want parallelism.

Nodejs is only asynchronous... so it's like you can have 2 functions calculating PI number at the same time, but they'll only be running on single CPU core... so on the surface 2 functions are "running", but only one is doing "work" at any given time.

There is a dirty hack which allows you to start new processes in node, but it's so bad you shouldn't really consider it for anything. And the design of nodejs is so bad it's abomination of any sane programming practices. Literally code that has more than 100 lines can be unreadable. Even error handling is broken by design, you should look into golang because it was designed to do the same thing, it's as easy to use BUT everything is working and designed correctly.

In golang, to write asynchronous function you just add "go" keyword in front of it, that's all what is needed. You call "a()" it's synchronous, you call "go a()" it's asynchronous. And, the best of it... if you call "go a();" multiple times it's automatically asynchronous AND paralell.

You don't need to kill yourself and waste time working using something that is by-design, broken. I know it's a harsh judgement and many people invested a lot of time into this... but really i'm surprised anyone still using node... it was good 8 years ago when there was no alternative. The whole thing is a dirty hack, IMO, which was once great compared to PHP... 8 years ago, for some very specific use cases.

Your code example is neither parallel nor asynchronous.

The code in your example uses a callback function for synchronous flow control.

Asynchronous

If you want to make it asynchronous you would have to do something that yields control while other work continues. Typical examples are...

Call a database or REST function that completes immediately and invokes a callback when complete.
Use setTimeout() or setInterval() to defer execution until after current work completes.

In your example you would have to add something asynchronous to it, say

function compileData(data, callback) {
    try {
        // databaseRequest is an asynchronous call that executes immediately.
        // When the request is complete, the library calls the anonymous function, 
        // which calls your callback.
        databaseRequest(data,function(result) {
            callback(null, result);
        });
    } catch (err) {
        callback(err, null);
    }
}

This is probably what you meant, but to make the point clear, in a case like this, you know the call is asynchronous simply because it returns and continues execution immediately, and the result is returned later via the anonymous function.

If you are unconvinced, put in some console.log() statements and observe the order in which they execute.

Parallel

You know the code in question is not executing functions in parallel because the node enviornment is single threaded. If you want them to run in parallel you will have to launch another Node process, or use one of the toolsets for this purpose. That is not a recommendation.

Licencié sous: CC-BY-SA avec attribution

Non affilié à softwareengineering.stackexchange