Question

So right now I'm trying to use Nodejs to access files in order to write them to a server and process them.

I've split it into the following steps:

  • Traverse directories to generate an array of all of the file paths
  • Put the raw text data from each of file paths in another array
  • Process the raw data

The first two steps are working fine, using these functions:

var walk = function(dir, done) {
    var results = [];
    fs.readdir(dir, function(err, list) {
        if (err) return done(err);
        var pending = list.length;
        if (!pending) return done(null, results);
        list.forEach(function(file) {
            file = path.resolve(dir, file);
            fs.stat(file, function(err, stat) {
                if (stat && stat.isDirectory()) {
                    walk(file, function(err, res) {
                        results = results.concat(res);
                        if (!--pending) done(null, results);
                    });
                } else {
                    results.push(file);
                    if (!--pending) done(null, results);
                }
            });
        });
    });
};
function processfilepaths(callback) {
    // reading each file
    for (var k in filepaths) { if (arrayHasOwnIndex(filepaths, k)) {
        fs.readFile(filepaths[k], function (err, data) {
            if (err) throw err;
            rawdata[k] = data.toString().split(/ *[\t\r\n\v\f]+/g);
            for (var j in rawdata[k]) { if (arrayHasOwnIndex(rawdata[k], j)) {
                rawdata[k][j] = rawdata[k][j].split(/: *|: +/);
            }}
        });
    }}
    if (callback) callback();
}

Obviously, I want to call the function processrawdata() after all of the data has been loaded. However, using callbacks doesn't seem to work.

walk(rootdirectory, function(err, results) {
    if (err) throw err;
    filepaths = results.slice();
    processfilepaths(processrawdata);
});

This never causes an error. Everything seems to run perfectly except that processrawdata() is always finished before processfilepaths(). What am I doing wrong?

Was it helpful?

Solution

You are having a problem with callback invocation and asynchronously calling functions. IMO I'll recommend that you use a library such as after-all to execute a callback once all your functions get executed.

Here's a example, here the function done will be called once all the functions wrapped with next are called.

var afterAll = require('after-all');

// Call `done` once all the functions
// wrapped with next() get called
next = afterAll(done);

// first execute this
setTimeout(next(function() {
  console.log('Step two.');
}), 500);

// then this
setTimeout(next(function() {
  console.log('Step one.');
}), 100);

function done() {
  console.log("Yay we're done!");
}

OTHER TIPS

I think for your problem, you can use async module for Node.js:

async.series([
    function(){ ... },
    function(){ ... }
]);


To answer you actual question, I need to explain how Node.js works:
Say, when you call an async operation (say mysql db query), Node.js sends "execute this query" to MySQL. Since this query will take some time (may be some milliseconds), Node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for MySQL to get back to us. Like handling that HTTP request. So, In your case both functions are independent and executes almost in parallel.

For more information:

function processfilepaths(callback) {
    // reading each file
    for (var k in filepaths) { if (arrayHasOwnIndex(filepaths, k)) {
        fs.readFile(filepaths[k], function (err, data) {
            if (err) throw err;
            rawdata[k] = data.toString().split(/ *[\t\r\n\v\f]+/g);
            for (var j in rawdata[k]) { if (arrayHasOwnIndex(rawdata[k], j)) {
                rawdata[k][j] = rawdata[k][j].split(/: *|: +/);
            }}
        });
    }}
    if (callback) callback();
}

Realize that you have:

for
    readfile (err, callback) {... }
if ...

Node will call each readfile asynchronously, which only sets up the event and callback, then when it is done calling each readfile, it will do the if, before the callback probably even has a chance to get invoked.

You need to use either Promises, or a promise module like async to serialize it. What you would then do looks like:

async.XXXX(filepaths, processRawData, 
   function (err, ...) {
      // function for when all are done
      if (callback) callback();
   }
);

Where XXXX is one of the functions from the library like series, parallel, each, etc... The only thing you also need to know is in your process raw data, async gives you a callback to call when done. Unless you really need sequential access (I don't think you do) use parallel so that you can queue up as many i/o events as possible, it should execute faster, maybe only marginally, but it'll better leverage the hardware.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top