Run NodeJS event loop / wait for child process to finish

Question 1

Adding a third ( :) ) solution to your problem after you clarified what behavior you seek I suggest using Fibers.

Fibers let you do co-routines in nodejs. Coroutines are functions that allow multiple entry/exit points. This means you will be able to yield control and resume it as you please.

Here is a sleep function from the official documentation that does exactly that, sleep for a given amount of time and perform actions.

function sleep(ms) {
    var fiber = Fiber.current;
    setTimeout(function() {
        fiber.run();
    }, ms);
    Fiber.yield();
}

Fiber(function() {
    console.log('wait... ' + new Date);
    sleep(1000);
    console.log('ok... ' + new Date);
}).run();
console.log('back in main');

You can place the code that does the waiting for the resource in a function, causing it to yield and then run again when the task is done.

For example, adapting your example from the question:

var pausedExecution, importantData;
function getImportantData() {
    while (importantData === undefined) {
        pausedExecution = Fiber.current;
        Fiber.yield();
        pausedExecution = undefined;
    }

    if (importantData === null) {
        throw new Error("Data could not be generated.");
    } else {
        // we should have proper data now
        return importantData;
    }
}

function callback(partialDataMessage) {
    if (partialDataMessage.needsCorrection) {
        var theData = getImportantData();
        // use data to correct message
        process.send(correctedMessage); // send corrected result to main process
    } else {
        process.send(partialDataMessage); // send unmodified result to main process
    }
}

function executeCode(code) {
    // setup child process to calculate the data
    importantDataCalculator = fork("./runtime");
    importantDataCalculator.on("message", function (msg) {
        if (msg.type === "result") {
            importantData = msg.data;
        } else if (msg.type === "error") {
            importantData = null;
        } else {
            throw new Error("Unknown message from dataGenerator!");
        }

        if (pausedExecution) {
            // execution is waiting for the data
            pausedExecution.run();
        }
    });


    // wrap the execution of the code in a Fiber, so it can be paused
    Fiber(function () {
        runCodeWithCallback(code, callback); // the callback will be called from time to time when the code produces new data
        // this callback is synchronous and blocking,
        // but it will yield control to the event loop if it has to wait for the child-process to finish
    }).run();
}

Good luck! I always say it is better to solve one problem in 3 ways than solving 3 problems the same way. I'm glad we were able to work out something that worked for you. Admittingly, this was a pretty interesting question.

Question 2

The rule of asynchronous programming is, once you've entered asynchronous code, you must continue to use asynchronous code. While you can continue to call the function over and over via setImmediate or something of the sort, you still have the issue that you're trying to return from an asynchronous process.

Without knowing more about your program, I can't tell you exactly how you should structure it, but by and large the way to "return" data from a process that involves asynchronous code is to pass in a callback; perhaps this will put you on the right track:

function getImportantData(callback) {
    importantDataCalculator = fork("./runtime");
    importantDataCalculator.on("message", function (msg) {
        if (msg.type === "result") {
            callback(null, msg.data);
        } else if (msg.type === "error") {
            callback(new Error("Data could not be generated."));
        } else {
            callback(new Error("Unknown message from sourceMapGenerator!"));
        }
    });
}

You would then use this function like this:

getImportantData(function(error, data) {
    if (error) {
        // handle the error somehow
    } else {
        // `data` is the data from the forked process
    }
});

I talk about this in a bit more detail in one of my screencasts, Thinking Asynchronously.

Question 3

What you are running into is a very common scenario that skilled programmers who are starting with nodejs often struggle with.

You're correct. You can't do this the way you are attempting (loop).

The main process in node.js is single threaded and you are blocking the event loop.

The simplest way to resolve this is something like:

function getImportantData() {
    if(importantData === undefined){ // not set yet
        setImmediate(getImportantData); // try again on the next event loop cycle
        return; //stop this attempt
    }

    if (importantData === null) {
        throw new Error("Data could not be generated.");
    } else {
        // we should have a proper data now
        return importantData;
    }
}

What we are doing, is that the function is re-attempting to process the data on the next iteration of the event loop using setImmediate.

This introduces a new problem though, your function returns a value. Since it will not be ready, the value you are returning is undefined. So you have to code reactively. You need to tell your code what to do when the data arrives.

This is typically done in node with a callback

function getImportantData(err,whenDone) {
    if(importantData === undefined){ // not set yet
        setImmediate(getImportantData.bind(null,whenDone)); // try again on the next event loop cycle
        return; //stop this attempt
    }

    if (importantData === null) {
        err("Data could not be generated.");
    } else {
        // we should have a proper data now
        whenDone(importantData);
    }
}

This can be used in the following way

getImportantData(function(err){
    throw new Error(err); // error handling function callback
}, function(data){ //this is whenDone in our case
    //perform actions on the important data
})

Question 4

Your question (updated) is very interesting, it appears to be closely related to a problem I had with asynchronously catching exceptions. (Also Brandon and Ihad an interesting discussion with me about it! It's a small world)

See this question on how to catch exceptions asynchronously. The key concept is that you can use (assuming nodejs 0.8+) nodejs domains to constrain the scope of an exception.

This will allow you to easily get the location of the exception since you can surround asynchronous blocks with atry/catch. I think this should solve the bigger issue here.

You can find the relevant code in the linked question. The usage is something like:

atry(function() {
    setTimeout(function(){
        throw "something";
    },1000);
}).catch(function(err){
    console.log("caught "+err);
});

Since you have access to the scope of atry you can get the stack trace there which would let you skip the more complicated source-map usage.

Good luck!

Run NodeJS event loop / wait for child process to finish

Edit

Actual application/implementation/problem