Blog

10 Tips for Coding with Node.js #2: How to Avail & Beware of the Ecosystem

10 Tips for Coding with Node.js #2: How to Avail & Beware of the Ecosystem

By David Mark Clements
On 2015-03-24

Welcome to part two of our ten part series of recommended practices and tips to help you on your journey with Node.js. To refresh your memory here are the ten tips under consideration:

[Develop debugging techniques][1]

Avail and beware of the ecosystem

[Know when (not) to throw][2]

[Reproduce core callback signatures][3]

Use streams

Break out blockers

Deprioritize synchronous code optimizations

Use and create small single-purpose modules

Prepare for scale with microservices

Expect to fail, recover quickly

In this post, we’ll be talking about making use of the extensive npm ecosystem, how to evaluate modules and how to avoid or protect against some common pitfalls.

[][4]The Ecosystem

$ npm install what-you-need

According to Module Counts, the Node ecosystem is the largest and one of the fastest growing of it’s peers. There are over 125,000 modules on npm, growing on average by 228 new modules per day. Compare this to Maven Central, Java’s package repository, which hosts just short of 98,000, growing at a rate of 60 packages per day. Similarily, Rubygems.org has around 96,500 packages, growing at a rate of 57 packages per day. These statistics should be qualified: the npm repository is not only for Node modules, there are client side JavaScript (and indeed a tiny proportion of bash/sh script) modules on npm. However, it is safe to say that the vast majority of modules on npm are intended for use with Node.

The Small Core Strategy

Since inception, the Node.js project has insisted on the smallest core possible, providing only the essential primitives required to build apps with file system, networking and command line abilities. Need WebSockets? Use a third party module that attaches to a core HTTP server. Need to support different routes? Maybe use an HTTP framework. Want to use a template language? Take your pick. I believe the small core strategy has contributed massively to the success of Node.js and to the growth of the ecosystem. It allows the core team to focus on quality, security and performance. In fact this very approach resulted in the heartbleed bug being removed from Node.js a year before heartbleed went public. They didn’t know it was a bug, it was just decided that this feature of OpenSSL was superfluous.

Availing of the Ecosystem

When starting a new project I break up the logic into the smallest pieces I can. Each of these pieces is a problem to solve. A problem solving attempt should begin with a search of npm. I always like to make sure before I write any code that a perfectly good existing solution isn’t already available. A module should be focused in purpose, it’s API should be obvious and it’s source code should be small enough to fit in my head. If such a module solves one of the problems of a project then the research time has paid for itself. If no module exists, there could be modules that solve part of the problem. This might mean the problem needs to be broken down further. There might also be a partial, unfinished module and perhaps that can be the basis for a solution, or at least give some insight.

[][5]

Caveat Utilitor

The great thing about npm is that anyone can publish to npm, all they have to do is register an account and npm publish. This also helps to explain the massive growth of the ecosystem. However, the scary thing about npm is that anyone can publish to npm. The laissez faire approach to ecosystem management has been fundamental to rapid growth. The trade off is the increased burden of discovery and evaluation on the module user. For rapid prototyping, there’s no doubt 125,000 modules at our fingertips is an amazing thing. But somewhere before going live, someone has to check that these modules we’re using are production worthy.

Module Evaluation Tools

There have been some community initiatives to amortize these efforts, for instance the Node Security Project issues security advisories and has an accompanying command line tool called nsp. The retire module performs a similar role by making a project aware of out-of-date modules. There’s also Node Zoo, which pulls in a variety of metrics to provide confidence rankings for a module. Ultimately though, it’s down to us to ensure the packages we use are safe and fit for purpose. For an example of manual module evaluation, one of our nearForm architects, Guy Ellis, wrote a piece on his approach to selecting a package.

Dependents

This is one of the most powerful metrics – it carries a similar weight to word of mouth. The module pages of the npm website detail the dependents at the bottom of the page, however the npm command line tool doesn’t provide a way to retrieve these dependents. So I have put together a small command line tool for doing just that:

$ npm -g install npm-dependents

We can see how many published modules are depending on express by running npm-dependents express, which will output something like this:

Default behavior of npm-dependents

Or, we can list out all modules depending on seneca with npm-dependents seneca --list.

Using the – -list flag with npm-dependents

When checking how many dependents a module has, there are a few things to bear in mind. It’s more powerful than download stats, because it means the module is beyond being played with, to being part of another tool. However, for command line tools this metric should be discounted – because unless a CLI tool is a build tool it’s unlikely to be used as a dependency of other packages.

Don’t Be Blinded by Popularity

To balance the previous section, just because something is popular it doesn’t mean its right for every case. Sometimes a module is less popular because it’s extremely niche, but it may just be the very thing that’s needed. Additionally, just because a module or framework is popular, doesn’t mean we should make assumptions about any sane behaviour that a framework or module should have. For example, the wildly popular express framework does not set secure defaults, it favours rapid development over production security, leaving that as an exercise for the user. If you’re interested in more information on server hardening, see the helmet package, the Kraken framework or get in touch with us.

Shiny Websites

Another smoke screen when evaluating modules can be a super-awesome-shiny website. If a module under evaluation is actually a framework maintained either by a large company or active community, then a polished site is neither a red flag nor a green light. However, for small independent modules curated by one to three developers, a Readme.md file on Github is sufficient, in fact it’s reassurring. If a small team has produced an amazing website to accompany a recently released module, it should generate a code quality concern. Shiny websites don’t always carry a strong correlation to good code quality, sometimes the opposite.

Who Wrote It?

Whilst it’s important to explore the ecosystem, we naturally begin to recognize prominent module authors. This is a healthy thing, learn who to trust and use their modules when you can.

Review the Source

Going through the source code of a third-party module is often a great educational exercise. Reading all the source code of all the modules and their sub-dependencies can be a daunting challenge. But quickly scanning the source for red lights can be a good way to catch potential issues. One thing to look out for is the use of eval, whether it is through calling eval or using new Function. Using eval with user input, on the server side, is really very dangerous. It also has performance implications. Understanding of context is vital. For instance, some template engines use eval ( for example dust, jade, Angular…). If we’re using a template engine we have to be okay that we’re trusting the engine to thoroughly clean user input, and understand the flow of data into and out of the eval. Other things to look out for (also context-dependent) could be the dependency making proper use of streams, or is it expecting to buffer all data then process. In these cases, the question must be asked: What’s the largest possible amount of data that could be passed through this module. Buffering then synchronously processing data is a recipe for disaster with Node.js.

Shrinkwrapping

$ npm shrinkwrap

When installing a module with npm install --save, the version number is added to the package.json as ^1.0.0 (assuming the current version is 1.0.0). The caret (^) is an instruction to npm, telling it to install the latest minor version. It is the equivalent of setting the version number to 1.x.x. This means when the dependencies for a package are installed, they may be different versions to those originally installed and tested during development. During development this is a desired behaviour. We want bug fixes (that is, increases to the final part of the version number), and backwards compatible API improvements (which is the minor (middle) version number). This is not something we want in most production scenarios. We want our dependencies to stay static, and we’ll upgrade dependencies manually. The npm shrinkwrap command will do a deep crawl of all dependencies, generating a shrinkwrap.json file. For instance, here is the first dependency in the shrinkwrap.json file generated for the npm-dependents module: { “name”:“npm-dependents”, “version”:“1.0.1″, “dependencies”: { “JSONStream”: { “version”:“0.10.0″, “from”:“JSONStream@*”, “resolved”:“https://registry.npmjs.org/JSONStream/-/JSONStream-0.10.0.tgz”, “dependencies”: { “jsonparse”: { “version”:“0.0.5″, “from”:“jsonparse@0.0.5″, “resolved”:“https://registry.npmjs.org/jsonparse/-/jsonparse-0.0.5.tgz” }, “through”: { “version”:“2.3.6″, “from”:“through@>=2.2.7 <3.0.0″, “resolved”:“https://registry.npmjs.org/through/-/through-2.3.6.tgz” } } },…snip… A gist of the full shrinkwrap.json file can be found here. When a module comes with a shrinkwrap.json file, npm will ignore the package.json file, using shrinkwrap.json to install specific versions for all dependencies and sub-dependencies.

Keep a Filtered Cache Repository

Once a list of vetted modules is compiled, isolating these modules from the public npm repository can be a useful way to share validation work across a team, or just make it easier for a single developer to work on several projects with pre-validated modules. This in itself is a huge topic and outside the scope of this post, however sinopia can be a good place to start. Sinopia acts as a private repository which also fetches and caches modules from npm. Essentially the idea is to npm install all validated modules, then disable proxying to the public repo.

Production Ready Modules

To finish, here is a list of modules we at nearForm believe are production ready. Whilst we have confidence in these modules and their authors, bugs can still easily slip in between version releases, and this list shouldn’t be a replacement for the due diligence required when using any module in production.

Conclusion

In conclusion, time researching modules is not wasted time – it may save future headache and heartache. When evaluating a module, look for key indicators, but always be aware of the larger context. Know that all that glitters is not gold – a shiny website, or lots of Github stars doesn’t necessarily indicate code quality. The best way to evaluate a module is to read the code – if it takes longer than an hour to understand the module’s probably too big (unless you’re looking for a framework or utility library). That’s it for this post. I hope you found it helpful. If you know of any other modules that you’ve found to be useful feel free to talk about them in the comments. The next tip will be ‘Know when (not) to throw’. In the meantime, post any comments or questions in the comment section below, subscribe to this blog to be notified as soon as follow-on posts published. Want to work for nearForm? We’re hiring.