NodeJS Events and Recursion (Readdir)
By Dan Baker, published 2010-09-16
Notice: This is confusing. The code is confusing and the concepts are confusing. But, the results are very easy to use. This is also a nice pattern to follow for when you need to do something similar. The reason I built this function is as follows: I was working in node.js. I was processing an html request. I needed to get two lists of files from different folders, and return both lists for the same request. One of the lists of files needed to be recursive (children folders, and children of children ...). The good ol' way of doing this (like in C or Java) would be to write a single function that would return a collection of files for a given folder. That function would call itself when it found a folder. Finally, when everything finished (the nested recursive calls all returned), the initial call to the function would return ALL the results. Remember, that this will not work in NodeJS using the awesome async calls. So, what is the concept? How do we design this in NodeJS/JavaScript using async calls? The good ol' function signature would look like the following:var data = readDirectory(path) But, that will not work here. We know that we will have to call "readdir", which returns immediately. We do get to pass a function to readdir that will get called/ran when readdir is finally finished. We want our new function to act just like readdir. The signature for our new function should end up being something like the following (Note: I've left out error handling to make this easier to read): readDirectory(path, fnc) fnc = a function that takes a single parameter, which will be the entire data returned. readDirectory(".", function(data) { ...work here... }); When the entire directory (path) has been read in, we'll want the "data" to be returned something like the following: data = []; // an array of objects. one object per file or folder in the path data[N].name = file/folder name data[N].stat = the "stat" for the file/folder data[N].children = if exists, another "data" array of objects containing the children Note: data[n].stat.isDirectory() will return true is data[N].name is a folder Now that we know what the signature looks like, and what data we are getting back, we can design how this new super awesome function is going to actually get it's work done. Here is the outline: (Note: to reduce indentation, most nested function stays at the same indentation level) - call "readdir", and return
- the function that readdir calls back continues...
- we now have an array of filenames
- iterate over the filenames
- for each filename, run the file system "stat" function. Which returns immediately
- the function that stat calls back continues...
- count the number of stat call backs, so we know the last file that calls back
- when the last file has called our call back function, we continue...
- at this point, we have the filename and the stats for the file
- create an object, and push it onto the data array that will get returned
- IF this filename is a folder then do the following:
- call ourself (readDirectory) with the path + filename
- when readDirectory calls back, we continue...
- set the .children attribute of the object to the data returned by readDirectory
- count the number of children folders processed
- when the last child folder has been processed then continue...
- run the original callback function with the original returned data
- if we have processed all files AND we didn't find any folders then continue...
- run the callback function with the data we've gathered
/** * read a directory (recursively deep) * data[] = an object for each element in the directory * .name = item's name (file or folder name) * .stat = item's stat (.stat.isDirectory() == true IF a folder) * .children = another data[] for the children */ readDirectory = function(path, next) { // queue up a "readdir" file system call (and return) fs.readdir(path, function(err, files) { var count = files.length; // count of files in the current folder var countFolders = 0; var data = []; // iterate over each file in the dir files.forEach(function (name) { // queue up a "stat" file system call for every file (and return) fs.stat(path + "/" + name, function(err, stat) { var obj = {}; obj.name = name; obj.stat = stat; data.push(obj); if (stat.isDirectory()) { countFolders += 1; // perform "readDirectory" on each child folder (which queues up a // "readdir" and returns) (function(obj2) { // obj2 = the "obj" object readDirectory(path + "/" + name, function(data2) { // entire child folder info is in "data2" (1 fewer child // folders to wait to be processed) countFolders -= 1; obj2.children = data2; if (countFolders <= 0) { // sub-folders found. This was the last sub-folder // to processes. next(data); } else { // more children folders to be processed. do nothing here. } }); })(obj); } // 1 more file has been processed. count -= 1; if (count <= 0) { // all files have been processed. if (countFolders <= 0) { // no sub-folders were found. DONE. call "next" next(data); // no sub-folders found } else { // children folders were found. do nothing here. } } }); }); }); }; Note: please see here for how to save variables for the next event-loop (The creation of a function, and running it). /** * read a directory (recursively deep) * data[] = an object for each element in the directory * .name = item's name (file or folder name) * .stat = item's stat (.stat.isDirectory() == true IF a folder) * .children = another data[] for the children * filter = an object with various filter settings: * .depth = max directory recursion depth to travel * (0 or missing means: infinite) * (1 means: only the folder passed in) * .hidden = true means: process hidden files and folders (defaults to false) * .callback = callback function: callback(name, path, filter) -- returns truthy to keep the file * * * @param path = path to directory to read (".", ".\apps") * @param callback = function to callback to: callback(err, data) * @param [filter] = (optional) filter object */ exports.readDirectory = function(path, callback, filter) { if (filter) { // process filter. are we too deep yet? if (!filter.depthAt) filter.depthAt = 1; // initialize what depth we are at if (filter.depth && filter.depth < filter.depthAt) { callback(undefined, []); // we are too deep. return "nothing found" return; } } // queue up a "readdir" file system call (and return) fs.readdir(path, function(err, files) { if (err) { callback(err); return; } var doHidden = false; // true means: process hidden files and folders if (filter && filter.hidden) { doHidden = true; // filter requests to process hidden files and folders } var count = 0; // count the number of "stat" calls queued up var countFolders = 0; // count the number of "folders" calls queued up var data = []; // the data to return // iterate over each file in the dir files.forEach(function (name) { // ignore files that start with a "." UNLESS requested to process hidden files and folders if (doHidden || name.indexOf(".") !== 0) { // queue up a "stat" file system call for every file (and return) count += 1; fs.stat(path + "/" + name, function(err, stat) { if (err) { callback(err); return; } var processFile = true; if (filter && filter.callback) { processFile = filter.callback(name, stat, filter); } if (processFile) { var obj = {}; obj.name = name; obj.stat = stat; data.push(obj); if (stat.isDirectory()) { countFolders += 1; // perform "readDirectory" on each child folder (which queues up a readdir and returns) (function(obj2) { // obj2 = the "obj" object exports.readDirectory(path + "/" + name, function(err, data2) { if (err) { callback(err); return; } // entire child folder info is in "data2" (1 fewer child folders to wait to be processed) countFolders -= 1; obj2.children = data2; if (countFolders <= 0) { // sub-folders found. This was the last sub-folder to processes. callback(undefined, data); // callback w/ data } else { // more children folders to be processed. do nothing here. } }); })(obj); } } // 1 more file has been processed (or skipped) count -= 1; if (count <= 0) { // all files have been processed. if (countFolders <= 0) { // no sub-folders were found. DONE. no sub-folders found callback(undefined, data); // callback w/ data } else { // children folders were found. do nothing here (we are waiting for the children to callback) } } }); } }); if (count <= 0) { // if no "stat" calls started, then this was an empty folder callback(undefined, []); // callback w/ empty } }); };