My First Look At Streams In Node.js

Last week, I started to look into Gulp.js as a possible build-system. It seems really cool; but, I immediately felt the pain of not understanding Node.js streams, which are at the core of Gulp.js. As such, I spent the last few days trying to build a Node.js stream. This turned out to be a bear of a task; but, I think I finally got something working! Since I love Regular Expressions, I thought it would be fun to build a stream that takes an input and then outputs matches of the given regular expression pattern.

I started out by building a Writable stream and just logging matches. When I got that working, I tried to add the "Readable" portion of the stream. But, that didn't work - I couldn't "inherit" from both the Writable and Readable classes.

Then, I tried to create a Duplex stream; but, that seemed janky since I had to listen for the "finish" event internally before finalizing the output stream. I think, in a Duplex stream, the input and the output are supposed to coexist with little cross-knowledge.

Finally, I tried a Transform stream, which seemed to be more inline with what I was trying to do. I was producing output based on the input, which is what the transform stream is doing - linking the input to the output. The transform stream also provides a "_flush" method, which gives us an easy hook into the end of the input stream and feels much cleaner than having to bind to the "finish" event internally.

When all was said and done, here's what I came up with. It's not supposed to be perfect - it was just an real-world case for stream investigation. It has clear flaws, like the inability to safely use look-aheads or to safely match across multiple chunks. But, I think it was sufficient enough to teach me the basics of how streams work.

// Include module references.

var fileSystem = require( "fs" );

var stream = require( "stream" );

var util = require( "util" );

var chalk = require( "chalk" );

// ---------------------------------------------------------- //

// ---------------------------------------------------------- //

// I am a Transform stream (writable/readable) that takes input and finds matches to the

// given regular expression. As each match is found, I push each match onto the output

// stream individually.

function RegExStream( pattern ) {

// If this wasnt' invoked with "new", return the newable instance.

if ( ! ( this instanceof RegExStream ) ) {

return( new RegExStream( pattern ) );

}

// Call super-constructor to set up proper options. We want to set objectMode here

// since each call to read() should result in a single-match, never a partial match

// of the given regular expression pattern.

stream.Transform.call(

this,

{

objectMode: true

}

);

// Make sure the pattern is an actual instance of the RegExp object and not just a

// string. This way, we can treat it uniformly later on.

if ( ! ( pattern instanceof RegExp ) ) {

pattern = new RegExp( pattern, "g" );

}

// Since the patter is passed-in by reference, we need to create a clone of it

// locally. We're doing to be changing the RegExp properties and we need to make

// If the current match is within the bounds (exclusive) of the input buffer,

// then we know we haven't matched a partial input. As such, we can safely push

// the match into the output.

if ( this._pattern.lastIndex < this._inputBuffer.length ) {

logInput( "Push:", match[ 0 ] );

this.push( match[ 0 ] );

// The next relevant offset will be after this match.

nextOffset = this._pattern.lastIndex;

// If the current match butts up against the end of the input buffer, we are in

// danger of an invalid match - a match that will actually span across two (or

// more) successive _write() actions. As such, we can't use it until the next

// write (or finish) event.

} else {

logInput( "Need to defer '" + match[ 0 ] + "' since its at end of the chunk." );

// The next relevant offset will be BEFORE this match (since we haven't

// transformed it yet).

nextOffset = match.index;

}

}

// If we have successfully consumed a portion of the input, we need to reduce the

// current input buffer to be only the unused portion.

if ( nextOffset !== null ) {

this._inputBuffer = this._inputBuffer.slice( nextOffset );

// If no match was found at all, then we can reset the internal buffer entirely. We

// know we won't need to be matching across chunks.

} else {

this._inputBuffer = "";

}

// Reset the regular expression so that it can pick up at the start of the internal

// buffer when the next chunk is ready to be processed.

this._pattern.lastIndex = 0;

// Tell the source that we've fully processed this chunk.

getNextChunk();

};

// ---------------------------------------------------------- //

// ---------------------------------------------------------- //

// Create an input stream from the file system.

var inputStream = fileSystem.createReadStream( "./input.txt" );

// Create a Regular Expression stream that will run through the input and find matches

// for the given pattern - "words".

var regexStream = inputStream.pipe( new RegExStream( /\w+/i ) );

// When the regex stream is ready, start reading-in word matches.

regexStream.on(

"readable",

function() {

var content = null;

// Since the RegExStream operates on "object mode", we know that we'll get a

// single match with each .read() call.

while ( content = this.read() ) {

logOutput( "Pattern match: " + content.toString( "utf8" ) );

}

}

);

// ---------------------------------------------------------- //

// ---------------------------------------------------------- //

// I log the given input values with a distinct color.

function logInput() {

var chalkedArguments = Array.prototype.slice.call( arguments ).map(

function( value ) {

return( chalk.magenta( value ) );

}

);

console.log.apply( console, chalkedArguments );

}

// I log the given output values with a distinct color.

function logOutput() {

var chalkedArguments = Array.prototype.slice.call( arguments ).map(

function( value ) {

return( chalk.bgMagenta.white( value ) );

}

);

console.log.apply( console, chalkedArguments );

}

The file that I am piping into the RegExStream contains the following content:

How funky is your chicken? How loose is your goose?

And, when we run the above Node.js file, using "\w+" as the regular expression pattern, we get the following console output:

Very cool stuff!

Now that I have a better understanding of Streams in Node.js, I think I can start to make some sense of the way Gulp.js is operating. Clearly, there's still going to be a lot of blind-spots; but, at least now I'll have some insight into why I couldn't even get a simple gulp-less plugin to work!

This post uses a Node.js Transform stream that I constructed by explicitly extending stream.Transform and defining prototype methods. It seems that this kind of stuff - in Gulp.js - is often encapsulated using the Through2 module. As such, I wanted to refactor this experiment using Through2:

I am the co-founder and lead engineer at InVision App, Inc — the world's leading prototyping,
collaboration & workflow platform. I also rock out in JavaScript and ColdFusion 24x7 and I dream about
promise resolving asynchronously.