Pergunta

As I have understood so far: Javascript is single threaded. If you defer the execution of some procedure, you just schedule it (queue it) to be run next time the thread is free. But Async.js defines two methods: Async::parallel & Async::parallelLimit, and I quote:

  • parallel(tasks, [callback])

Run an array of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback...

  • parallelLimit(tasks, limit, [callback])

The same as parallel only the tasks are executed in parallel with a maximum of "limit" tasks executing at any time.

As far as to my understanding of English, when you say: "doing tasks in parallel" means doing them at the same time - simultaneously.

How may Async.js execute tasks in parallel in a single thread? Am I missing something.

Foi útil?

Solução

How may Async.js execute tasks in parallel in a single thread? Am I missing something.

parallel runs all its tasks simultaneously. So if your tasks contain I/O calls (e.g. querying DB), they'll appear as if they've been processed in parallel.

how is this enabled in a single thread?! that is what I could not make sense of.

Node.js is non-blocking. So instead of handling all tasks in parallel, it switches from one task to another. So when the first task makes I/O call making itself idle, Node.js simply switches to processing another one.

I/O tasks spent most of its processing time waiting for the result of the I/O call. In blocking languages like Java, such a task blocks its thread while it waits for the results. But Node.js utilizes it's time to process another tasks instead of waiting.

so that means that if the inner processing of each task is asynchronous the thread is granted to each bit of this tasks regardless if anyone of them has finished or not until all have finished their bits?

Yes, it's almost as you said. Node.js starts processing the first task until it pauses to do an I/O call. At that moment, Node.js leaves it and grants its main thread to another task. So you may say that the thread is granted to each active task in turn.

Outras dicas

Async.Parallel is well documented here: https://github.com/caolan/async#parallel

Async.Parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.

The functions are not executed simultaneously, but when the first function handed off to an asynchronous task (e.g. setTimeout, network, ...), the second will start, even if the first function hasn't called the provided callback.

As for the number of parallel tasks: That depends on what you pick.

As far as to my understanding of English, when you say: "doing tasks in parallel" means doing them at the same time - simultaneously.

Correct. And "simultaneously" means "there is at least one moment in time when two or more tasks are already started, but not yet finished".

How may Async.js execute tasks in parallel in a single thread? Am I missing something.

When some task stops for some reason (i.e. IO), async.js executes another task and continues first one later.

Your doubts make perfect sense. It's been few years since you asked this question but I think it's worth to add few thinks to the existing answers.

Run an array of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback...

This sentence is not entirely correct. In fact it does wait for each function to have completed because it's impossible not to do so in JavaScript. Both function calls and function returns are synchronous and blocking. So when it calls any function it has to wait for it to return. What it doesn't have to wait for is the calling of the callback that was passed to that function.

Allegory

Some time ago I wrote a short story to demonstrate that very concept:

To quote a part of it:

“So I said: ‘Wait a minute, you tell me that one cake takes three and a half hours and four cakes take only half an hour more than one? It doesn’t make any sense!’ I though that she must be kidding so I started laughing.”
“But she wasn’t kidding?”
“No, she looked at me and said: ‘It makes perfect sense. This time is mostly waiting. And I can wait for many things at once just fine.’ I stopped laughing and started thinking. It finally started to get to me. Doing four pillows at the same time didn’t buy you any time, maybe it was arguably easier to organize but then again, maybe not. But this time it was something different. But I didn’t really know how to use that knowledge yet.”

Theory

I think it's important to emphasize that in single-threaded event loops you can never do more than one thing at once. But you can wait for many things at once just fine. And this is what happens here.

The parallel function from the Async module calls each of the function one by one, but each function has to return before the next one can be called, there is no way around it. The magic here is that the function doesn't really do its job before it returns - it just schedules some task, registers an event listener, passes some callback somewhere else, adds a resolution handler to some promise etc.

Then, when the scheduled task finishes, some handler that was previously registered by that function is executed, this in turns executes the callback that was originally passed by the Async module and the Async module knows that this one function has finished - this time not only in a sense that it returned, but also that the callback that was passed to it was finally called.

Examples

So, for example let's say that you have 3 functions that download 3 different URLs: getA(), getB() and getC().

We will write a mock of the Request module to simulate the requests and some delays:

function mockRequest(url, cb) {
  const delays = { A: 4000, B: 2000, C: 1000 };
  setTimeout(() => {
    cb(null, {}, 'Response ' + url);
  }, delays[url]);
};

Now the 3 functions that are mostly the same, with verbose logging:

function getA(cb) {
  console.log('getA called');
  const url = 'A';
  console.log('getA runs request');
  mockRequest(url, (err, res, body) => {
    console.log('getA calling callback');
    cb(err, body);
  });
  console.log('getA request returned');
  console.log('getA returns');
}

function getB(cb) {
  console.log('getB called');
  const url = 'B';
  console.log('getB runs request');
  mockRequest(url, (err, res, body) => {
    console.log('getB calling callback');
    cb(err, body);
  });
  console.log('getB request returned');
  console.log('getB returns');
}

function getC(cb) {
  console.log('getC called');
  const url = 'C';
  console.log('getC runs request');
  mockRequest(url, (err, res, body) => {
    console.log('getC calling callback');
    cb(err, body);
  });
  console.log('getC request returned');
  console.log('getC returns');
}

And finally we're calling them all with the async.parallel function:

async.parallel([getA, getB, getC], (err, results) => {
  console.log('async.parallel callback called');
  if (err) {
    console.log('async.parallel error:', err);
  } else {
    console.log('async.parallel results:', JSON.stringify(results));
  }
});

What gets displayed immediately is this:

getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getC called
getC runs request
getC request returned
getC returns

As you can see this is all sequential - functions get called one by one and the next one is not called before the previous one returns. Then we see this with some delays:

getC calling callback
getB calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

So the getC finished first, then getB and getC - and then as soon as the last one finishes, the async.parallel calls our callback with all of the responses combined and in correct order - in the order that the function was ordered by us, not in the order that those requests finished.

Also we can see that the program finishes after 4.071 seconds which is roughly the time that the longest request took, so we see that the requests were all in progress at the same time.

Now, let's run it with async.parallelLimit with the limit of 2 parallel tasks at most:

async.parallelLimit([getA, getB, getC], 2, (err, results) => {
  console.log('async.parallel callback called');
  if (err) {
    console.log('async.parallel error:', err);
  } else {
    console.log('async.parallel results:', JSON.stringify(results));
  }
});

Now it's a little bit different. What we see immediately is:

getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns

So getA and getB was called and returned but getC was not called at all yet. Then after some delay we see:

getB calling callback
getC called
getC runs request
getC request returned
getC returns

which shows that as soon as getB called the callback the Async module no longer has 2 tasks in progress but just 1 and can start another one, which is getC, and it does so immediately.

Then with another delays we see:

getC calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

which finishes the whole process just like in the async.parallel example. This time the whole process also took roughly 4 seconds because the delayed calling of getC didn't make any difference - it still managed to finish before the first called getA finished.

But if we change the delays to those ones:

const delays = { A: 4000, B: 2000, C: 3000 };

then the situation is different. Now async.parrallel takes 4 seconds but async.parallelLimit with the limit of 2 takes 5 seconds and the order is slightly different.

With no limit:

$ time node example.js
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getC called
getC runs request
getC request returned
getC returns
getB calling callback
getC calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

real    0m4.075s
user    0m0.070s
sys     0m0.009s

With a limit of 2:

$ time node example.js
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getB calling callback
getC called
getC runs request
getC request returned
getC returns
getA calling callback
getC calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

real    0m5.075s
user    0m0.057s
sys     0m0.018s

Summary

I think the most important thing to remember - no matter if you use callbacks like in this case, or promises or async/await, is that in single-threaded event loops you can do only one thing at once, but you can wait for many things at the same time.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top