node_modules in git

One of the many things we have been forced to rethink in the world of node is how we handle dependencies in applications.

One of the big changes that came with 0.4.0 was support for node_modules. This change had major consequences. It elevated local modules, in a local directory, above modules installed globally. Along with npm changing its default install preference to local rather than global we've seen a nearly unanimous shift to local module installs that has made global installs somewhat distasteful.

The shift from global module installs to local ones sets node apart from previous generation platforms. Ruby and Python fail horribly in this arena and the fact that it is now standard practice to develop and deploy in to entirely sandboxed environments of the entire platform (virtualenv, rvm) is admission of that failure.

And it gets better. Node’s local module support accomplishes what no other platform I know of has done, it allows for two dependencies to require entirely different versions of the same dependency without caveats and unforeseen failures. This required some prolific logic in core to resolve module names locally and recursively. But the most important requirement on this support was that nobody, ever, rely on module level globals to be global for an entire node process, which has actually been enforced by the community for some time.

Now we’re all hooked on local modules, and outside of some early bitching everyone seems to have come around, but we’re still holding on to a few habits from the old global days.

With global installs, and especially with preferential treatment given in name resolution to global modules over local ones, checking your dependencies in to source control was a very bad thing. It’s mainly bad because it’s an outright lie, having the code there doesn’t mean it’ll actually be used if the module was install globally. We developed huge deployment tools to ensure that when code gets deployed one place and then a week later the same code is deployed to a new location, that they both get all the same dependencies installed. These tools are all a pain, because the problem itself is kind of a pain.

But this isn’t Ruby or Python anymore, this is node.js, and we did modules much better. If you have an application, that you deploy, check in all your dependencies in to node_modules. If you use npm do deploy, only define bundleDependencies for those modules. If you have dependencies that need to be compiled you should still check in the code and just run $ npm rebuild on deploy.

Everyone I’ve told this too tells me I’m an idiot and then a few weeks later tells me I was right and checking node_modules in to git has been a blessing to deployment and development. It’s objectively better, but here are some of the questions/complaints I seem to get.

Why can’t I just use version locking to ensure that all deployments get the same dependencies?

Version locking can only lock the version of a top level dependency. You lock your version of express to a particular version and you deploy to a new machine 3 weeks later it’s going to resolve express’s dependencies again and it might get a new version of Connect that introduces subtle differences that break your app in super annoying and hard to debug ways because it only ever happens when requests hit that machine. This is a nightmare, don’t do it.

Why don’t we encourage the maintainers of those libraries to lock all their dependency versions as well?

Today there are roughly 5,500 packages in the npm registry. Last month over 600 new packages were pushed and over 5,000 packages were updated. Some packages push multiple updates per week. As a community, we need to distribute some of the integration testing work. It’s not conceivable for most package maintainers to sit down and test their package will all the new updates that ship for their deps. This is why package maintainers should not version lock, and should not check in their deps. We need new people to upgrade the deps locally and report bugs. We need to keep moving this community forward and staying on top of these new packages.

Only applications that are deployed should checkin node_modules. Package maintainers should continue to define what they think are acceptable version ranges, it’s the only way we can keep the community up with the rate of change and improvement we see in node.js.

Doesn’t checking in node_modules create a lot of noise in the source tree that isn’t related to my app?

No, you’re wrong, this code is used by your app, it’s part of your app, pretending it’s not will get you in to trouble. You depend on other people’s code and they are just as likely to write new bugs as you are, probably more so. Checking all of that code in to source control gives you a way to audit every line that ever changed in your application. It allows you to use $ git bisect locally and be ensured that it’s the same as in production and that every machine in production is identical. No more tracking down unknown changes in dependencies, all the changes, in every line, are viewable in source control.

Ok, fine, what do we do now?

To recap.

Only checkin node_modules for applications you deploy, not reusable packages you maintain.

Any compiled dependencies should have their source checked in, not the compile targets, and should $ npm rebuild on deploy.

All you people who added node_modules to your gitignore, remove that shit, today, it’s an artifact of an era we’re all too happy to leave behind. The era of global modules is dead.