Sunday, May 15, 2011

A Not Very Short Introduction To Node.js

Node.js is a set of asynchronous libraries, built
on top of the Google V8 Javascript
Engine. Node is used for server side
development in Javascript. Do you feel the rush of the 90's coming
through your head. It is not the revival of
LiveWire, Node is
a different beast. Node is a single threaded process, focused on doing
networking right. Right, in this case, means without blocking I/O. All
the libraries built for Node use non-blocking I/O. This is a really cool
feature, which allows the single thread in Node to serve thousands of
request per second. It even lets you run multiple servers in the same
thread. Check out the performance characteristics of Nginx and Apache
that utilize the same technique.

So, all libraries that deal with IO has to be re-implemented with this
style of programming. The good news is that even though Node has only
been around for a couple of years, there are more than 1800 libraries
available. The libraries are of varying quality but the popularity of
Node shows good promise to deliver high-quality libraries for anything
that you can imagine.

History

Node is definitely not the first of its kind. The non-blocking
select() loop, that is at the heart of Node, dates back to 1983.

Quite similar. A cool thing is that the servers can be started from the
same file and node will, happily, serve both HTTP and echo requests from
the same thread without any problems. Let's try them out!

Templating Engines

Everytime a new platform makes its presence, it brings along a couple of
new templating languages and Node is no different. Along with the
popular ones from the Ruby world, like Haml and Erb (EJS in Node),
comes some new ones like Jade and some browser templating languages like
Mustache and jQuery templates. I'll show examples of Jade and Mu
(Mustache for Node).

I like Jade, because it is a Javascript dialect of Haml and it seems
appropriate to use if I'm using Javascript on the server side.

Apart from that there are at least 30 different testing frameworks to
use. I have chosen to use NodeUnit since I find that it handles
asynchronous testing well, and it has a nice UTF-8 output that looks
good in the terminal,

Deployment

There are already a lot of platforms providing Node as a service (PaaS
, Platform as a Service). Most of them are using
Heroku style deployment by pushing to a Git remote.
I'll show three alternatives that all provide free Node hosting.

Joyent (no.de)

Joyent, the employers of Ryan Dahl, give you ssh access so that you
can install the modules you need. Deployment is done by pushing to
a Git remote.

Cloud Foundry

Cloud Foundry is one of the most interesting platforms in the cloud. It
was genius by VM Ware to open source the platform, allowing anyone to
set up their own cloud if they wish. If you don't want to setup your own
Cloud Foundry Cloud, you can use the service hosted at
cloundfoundry.com.

With Cloud Foundry, you install the modules locally and then they are
automatically deployed as part of the vmc push. Push in this case does
not mean git push, but instead, copy all the files from my local machine
to the server.

Tools

There are of course a bunch of tools that come with a new platform,
Jake, is a Javascript version of Rake, but I am happy with Rake and
I don't see the need to switch. But, there are some tools that I cannot
live without when using Node.

Reloaders

If you use the vanilla node command then you have to restart it
every time you make a change to a file. That is awfully annoying and
there are already a number of solutions to the problem.

If you want a GUI debugger, it is possible to use the one that comes with
Chrome by installing the node-inspector. It is started similarly to
the built in debugger, but the --debug is an option instead of
a subcommand.

Idioms

Idioms, patterns, techniques, call it what you like. Javascript code is
littered with callbacks, and event more so with Node. Here are some tips
on how to write good asynchronous code with Node.

Return on Callbacks

It is easy to forget to escape from the function after a callback has
been called. An easy way to remedy this problem is to call return before
every call to a callback. Even though the value is never used by the
caller, it is an easy pattern to recognize and it prevents bugs.

functiondoSomething(response, callback){doAsyncCall('tapir',function(err, result){if(err){// return on the callbackreturncallback(err);}// return on the callbackreturncallback(null, result);});}

Exceptions in Callbacks

Exceptions that occur in callbacks cannot be handled the way we are used
to, since the context is different. The solution to this is to pass
along the exception as a parameter to the callback. In Node the
convetion is to pass the error as the first parameter into the callback.

Parallel Execution

If you have multiple tasks that need to be finished before you take some
new action, this can be handled with a simple counter. Here is an
example of a simple function that starts up a bunch of functions in
parallel and waits for all of them to finish before calling the
callback.

// Do all in parallelfunctiondoAll(collection, callback){var left = collection.length;
collection.forEach(function(fun){fun(function(){if(--left ==0)callback();});});};// Use itvar result =[];doAll([function(callback){setTimeout(function(){result.push(1);callback();},2000)},function(callback){setTimeout(function(){result.push(2);callback();},3000)},function(callback){setTimeout(function(){result.push(3);callback();},1000)}],function(){return result;}// returns [3, 1, 2]

Sequential Execution

Sometimes the ordering is important. Here is a simple function that
makes sure that the calls are executed in sequence. It uses recursion to
to make sure that the calls are handled in the correct order. It also
uses the Node function process.nextTick() to prevent the stack from
getting to large for large collections. Similar results can be obtained
with setTimeout() in browser Javascript. It can be seen as a simple
trick to achieve tail recursion.

functiondoInSequence(collection, callback){var queue = collection.slice(0);// Duplicatefunctioniterate(){if(queue.length ===0)returncallback();// Take the first elementvar fun = queue.splice(0,1)[0];fun(function(err){if(err)throw err;// Call it without building up the stack
process.nextTick(iterate);});}iterate();}var result =[];doInSequence([function(callback){setTimeout(function(){result.push(1);callback();},2000)},function(callback){setTimeout(function(){result.push(2);callback();},3000)},function(callback){setTimeout(function(){result.push(3);callback();},1000)}],function(){return result;});// Returns [1, 2, 3]

Library Support for Asynchronous Programming

If you don't want to write these functions yourself, there are a few
libraries that can help you out. I'll show two version that I like.

Fibers

Fibers are also called co-routines. Fibers provide two functions,
suspend and resume, which allows us to write code in a synchronous
looking style. In the Node version of fibers,
node-fibers, suspend and
resume are called yield() and run() instead.

Fibers are a very nice way of writing asynchronous code but, in Node,
they have one drawback. They are not supported without patching the V8
virtual machine. The patching is done when you install node-fibers and
you have to run the command node-fibers instead of node to use it.

The async Library

If you don't want to use the patched version of V8, I can recommend the
async library. Async provides
around 20 functions that include the usual 'functional' suspects (map,
reduce, filter, forEach...) as well as some common patterns for
asynchronous flow control (parallel, series, waterfall...). All these
functions assume you follow the Node convention of providing a single
callback as the last argument of your async function.

async.map(['file1','file2','file3'], fs.stat,function(err, results){// results is now an array of stats for each file});
async.filter(['file1','file2','file3'], path.exists,function(results){// results now equals an array of the existing files});
async.parallel([function(){...},function(){...}], callback);
async.series([function(){...},function(){...}], callback);

Conclusion

Node is definitely an interesting platform. The possibility to have
Javascript running through the whole stack, from the browser all the way
down into the database (if you use something like CouchDB or MongoDB)
really appeals to me. The easy way to deploy code to multiple, different
cloud providers is also a good argument for Node.

Thanks for this post, it was very useful for me. I don't think I 'got' node.js before this, because all the tutorials that I had seen before now stopped shortly after introducing npm and didn't go as far as the whole stack.

@sandy You are right, the execution will not be parallel from the point of view of Node. But, if the execution takes place outside of node, the execution may be parallel if there are multiple cores on the executing computer. The callbacks will be handled one at a time by Node.