I’ve been using static publishing platforms for a while now. The output is enduring and easily archived, and reliable and robust. As an author, there’s also a lot of truth to the unreasonable effectiveness of GitHub browsability however much I disagree with the philosophy therein of committing build products in with the sources. I’ve used Jekyll; Hexo, which is what I use to write this blog; I’ve used Movable Type long ago.

However, all these systems are more complex than I’d like, and prone to bit-rot, far far faster than the content they generate. Runtimes change. Dependencies rot as maintainers move on and no longer can account for those runtime changes. Development moves on to new major versions or being built with a newer fad in software design. Hexo has treated me better than most, but it is large, and the configuration rather arbitrary in places. Plugins have to be written specifically for Hexo, so there’s a balkanized ecosystem that doesn’t flourish as well as other parts do. All these static publishing tools tend to have things in common. Builds have to happen as quickly as they can, and usually this is a bit too slowly. The author will want to preview their work in context, so serving up the rendered pages is important. Live rebuilds by file monitoring reduce friction in the workflow for some people, though I personally don’t care much for it, preferring to run a build when I’m ready.

It turns out that building derived things from a list of inputs with dependencies is a thing that computers have been told to do for a long time. Nearly all compiled software is built this way. We have tools like make(1) and a host of other, more complex and less general tools for various programming languages. I’ve always wondered why we didn’t use those to build sites as well. People have, it turns out, but make(1) in particular is a bit messier for the task than one would hope. There are other tools, and I settled on building with one called tup

This weekend I built a small static publishing platform, and you can too. I wanted to build a site using Tufte CSS, and the minimalism of the presentation is a great fit for a super tiny static publishing platform.

A site like this needs to output:

Each post as an HTML file

An index page listing posts

Its CSS and any assets needed to render

This really isn’t a huge list.

First, let’s reach for a tool that can take a list of files and build all the derived things. make(1) is annoying here, because you have to tell it what to build, and it backtracks and figures out how to make it. We don’t actually have that information easily encoded, but we will have a list of sources, and can make a list of what to do with them. If you’re writing, you probably have a reason for it, right? Or an asset, it’s going to get used, why else would it be there? Starting at the source makes a lot more sense, and as it turns out, it makes incremental builds a lot faster. Enter our first player: tup.

$ brew cask install osxfuse$ brew install tup

I’m not sure why tup now depends on FUSE, but that’s a task for another day.

Let’s start a directory for our project.

$ mkdir my-static-site$ npm init$ mkdir posts

Make a sample markdown file in the posts directory.

Next we create a Tupfile to describe how we’re going to build this site. Then we can just type tup to build the site, or tup monitor on Linux for that live building mode. First, let’s handle each post as HTML. We can use an off the shelf markdown renderer at first.

$ npm install marked

Here’s a Tupfile

: foreach posts/*.md |> marked %f -o %o |> public/%B.html

This means that for each post in the posts directory, we’ll make an equivalent HTML file.

Let’s take a look at some of these rendered files. We’ll need to serve this directory by HTTP if we want to see it as we will on the web.

Just a directory full of HTML, and ‘full’ is just our one test post, but we should be able to navigate to one. We have a static site, if a lousy one! That HTML is pretty spartan, so let’s add some assets.

Copy the et-book directory of fonts from the Tufte CSS package into the root of the project, and the tufte.css file.

Let’s add a few rules to publish those as part of the site, too. Added to the Tupfile:

Now about that index! The index needs to know the post’s title, and really, posts don’t even have titles yet. Let’s add some to our test post as YAML front matter. Add this at the top of the markdown file.

----title: My Postdate: 2017-12-04 01:51:43----

Every post gets a title and the date.

Let’s change our renderer to put the title on the page so we don’t have to reduplicate it.

To the template calls in both render.js and index.js, let’s add the require function, so that templates can require their own stuff.Where there’s template({ metadata }), let’s change that to template({ metadata, require })

This is roughly the same code. I ported it to not be written using ES6 Modules, used core node assert instead of chai (It has the same functionality being used!), and removed Flow type annotations. It works in node 8 easily, and should work in node 4.

I work in constrained environments: page load time is very important to me. If I’m loading even a fraction of this in a browser, I’ve blown my budget. I run a bunch of hobby projects on a very inexpensive server. RAM is at a premium. All of these things have costs.

Today I had an interview; not the super intense kind, but grab coffee with a recruiter, chat about goals and desires and see if companies she represents are a match for my skillset.

I missed the appointment. There’s myriad reasons, including my being bad with dates and time in general, but I’ve got a system that usually works for me. I delegate carefully to my computer and set every appointment to vibrate my phone. However, this particular event was vexed from the beginning by a several failures, each individually insufficient to make me miss the appointment, but together did the job nicely.

We chatted briefly the other day to set up the appointment. I put it in my calendar, and she sent me a calendar invite via Google calendar to my personal email address (which is not a gmail account, however it is the email I sign in to Google with). Failure number one: There’s no way not to have a google calendar, so Google auto-added it to my calendar there, and I strongly suspect events are considered somewhat confirmed (at least receipt of invite shown) at this point.

Her invite had more information (like location) than my hand-entered entry, so I opened the .ics file that Google emails to my address when someone sends me a Google calendar invite. I use Apple’s Calendar app, since it’s got a much faster user interface than Google calendar, and syncs with iCloud quite nicely. It’s a tiny bit more in my control than Google is. When I opened the .ics file, it added it to my calendar on the screen, and I deleted my copy of the event.

Failure number two: events added from an .ics file sent by Google can’t be edited. Including the ability to set up a notification.

Failure the third: immediately after, I get a message from the Calendar app that it couldn’t sync the event, error “403” (HTTP for “Forbidden”, which in this case tells me about as much as the word “potato”). Apple has chosen a protocol called CalDAV for its calendars, and has not put effort into making sure all the error messages are meaningful. It then presents me with three opaque options: “Retry”, “Ignore” and “Revert to Server”. The first fails with 403 again. The second will leave the entry on my computer, but not sync it to iCloud, and I only know this from a little experimentation and knowledge of how these systems work under the hood. The third removes the entry from my calendar. Failure the fourth: none of these options are useful. I eventually ignore the error and set about making it work right.

I copy and paste the event to another calendar in the Calendar app. This time it works, and I copy it back to the correct calendar, the one I have set up to sync to iCloud and my phone. It works. Or so it seems. I move on with my day. I have an event in Calendar, that hasn’t given me a sync error, that has a notification, and the time, date and location of my meeting. It does, however, try to send an invite to the recruiter who invited me, making a second meeting at the exact same time and place. I decline to do so. Points to Apple for giving me the option.

This morning I wake up, glance at my phone’s calendar, see I have no events until afternoon, and sleep late. I miss my appointment.

Failure the fifth: It turns out, that appointment didn’t sync to the phone. I had checked the original, hand-entered appointment, since I’m insecure about calendars, but that one got deleted way up at the start of this fiasco. The app I use to synchronize an android phone with an iCloud calendar is, while a little ugly in the user interface is a normally robust piece of software that has not betrayed me, until today. There was no error, and so I don’t know whether this event didn’t sync fully in some way or whether the sync program is broken even though it shows my event later in the day. It shows on my husband’s phone, who subscribes directly via iCloud since it’s an Apple device. It made it to Apple’s servers.

Failure the sixth: My computer froze last night, and so, it also did not show any hint that I might have an event today.

All in all, I missed a relatively trivial event. However, if this had been a later interview, this may well have cost me a job. This is where the ethics of software design come in. These are all failures of engineering, and some of them quite forseeable. Software must plan to have bugs, to fail gracefully. The failure case here was silent, and may well be costly to users who experience it. However, at the end of the day, there is no accountability: aside from the chance they read this blog post, engineers at Apple and Google will never know about this failure. I have no options for managing this data that do not involve third parties short of hand-entering calendar entries into multiple devices.

There were also number of preventable failures, mostly in the design of these pieces of software.

Why can I not have a Google Calendar, and interact with Google Calendar users entirely by email? They seem awfully certain I’ve received invites when I have not, though in this case that part worked out.

Apple’s engineers did not account for getting error messages to humans, and so we end up with opaque, low-level errors like “403” with no meaning and no way to correct whatever condition caused them. We just guess at what might be wrong and try to act accordingly. I may well have guessed wrong.

Apple’s calendar program is not designed as a distributed system. It assumes networks are reliable, bugs do not exist, and that errors are transient. The reality is that none of these things are true. Its design does not expose details of what it’s doing, does not expose the state of sync clearly, and does not let you inspect what’s going on. It sweeps its design flaws under a very pleasant user interface rug.

Google’s dominance of the industry has left users with few working alternatives, and its products do the bare minimum to interoperate, if at all, and usually only when Google owns the server portion. Their calendar application on my phone does not speak the standard protocols used by Apple.

Apple’s extensions to CalDAV with push notifications for added events are also private, and third-party applications cannot use those features.

None of these applications center the user’s agency and let them make a fallback plan when these services fail, and these services do fail, often silently.

My needs are modest: enter events in calendar on whichever device I’m using, particularly the ones with good keyboards. Have my phone tell me where I need to be.

Modes of analysis that surface these kinds of design and user experience issues are central to designing good applications. It’s highly technical work, requiring the expertise of engineers and designers, especially as evaluating potential solutions to these design problems is part of the task.

Centering ethics in the design would have changed the approach most of these engineers took in the design of these applications. Error messages would have been a focus. A mode for working when the network is down or server is misbehaving may have been created. A trail of accountability to diagnose the failure would have been built. Buzzwords like ‘user agency’ aren’t just words in UX design textbooks (though they should be), but the core of the reason software exists. Engineering that centers its users, analyzes their needs, and evaluates the ways potential solutions fail and solves those problems is what engineering should be.

My apologies to the recruiter I stood up today, I hope you enjoyed a latte without me, and talk to you Monday.

What is a module?

I’m actually going to spend some time on this one because while it’s an
everyday word in our industry, it’s one we don’t often hear defined. I want you
to think about what it means to make software modular.

A module is a bit of software that has an interface defined between it and
the rest of the system.

This is one of the simplest definitions I could come up with. There are some
implications here: There’s a separation between the module and the rest of the
system. I’m not saying how far, but it’s actually a separate entity. I will
get into what “interface” can mean later in this post. The bit I think is
really interesting is the word defined. This means we’ve made decisions in
making a module. Extracting something blindly into a separate file probably
only counts on a technicality. Defining something is intentional.

I’m a programmer working for a package manager company, and I think of code as
art, and I’ve been making open source my entire professional life, so I’ve also
got a particular bunch of things I also mean when I say module.

A piece of software with a defined interface and a name that can be shared
with or exposed to others.

I won’t advocate sharing everything, since I’m talking about radical
modularity and not radical transparency here, but I want the option. The rest,
though, are where things get interesting. In particular, I want to talk about
names.

When we name something, it takes on a life of its own. It’s now an object in
its own right. This happens when we name a file, it happens when we name a
package. A name is the handle we can grab onto something with mentally and
start treating it independently.

A defined interface is the first step of independence. It’s the boundary that
gives a thing a separate internal life and external life. Things outside a
module get a relationship with the boundary, and inside the module, anything
not exposed by the boundary can be re-arranged and edited without changing
those relationships.

I named her. The power of a name. That’s old magic.
—Tenth Doctor, “The Shakespeare Code”

Not every module even gets published or becomes a package on a package registry
like npm or crates. We usually push things to GitHub early, but source control
isn’t quite the same thing as publishing things for others to use. Just
separating things into a separate file — there’s the naming — and choosing what
constitutes the interface to the rest of the system is modularizing.

We can commit to names more firmly by publishing and giving version numbers,
and breathe life into something as a fully separate entity, but that’s not
required, and that alone isn’t often enough to make a whole project.

Self-sustaining open source projects have to be bigger than tiny modules, and
so you can either enlarge modules until they become self-sustaining, or your
project is a group of related modules, like Hoodie, where there are a bunch of
small, named parts.

There’s another option, which is to make modules so small they trend toward
finished, done, names for a now-static thing. Maybe they are bestowed upon
someone who tidies them up, finishes a few pieces that we left ragged, maybe
just left in a little library box for someone else to discover. Maybe they’re
published, maybe they’re widely used and loved, maybe not. Maybe they end up in
a scrap heap for our later selves or others to build something new from.

Art does not reproduce what is visible; it makes things visible.
—Paul Klee

Something the open source movement did that isn’t all that widely acknowledged
is make a huge ecosystem for the performance of software as a social art. Not
only that, but since then, the explosion of social openness in the creation of
software has created a new, orthogonal movement less concerned with copyright
law and open engineering, but open sharing of knowledge and techniques, and as
a side effect of that and the rise of the app, software engineering now
includes the practice of software as art and craft.

I practice code as an art.

A good portion of that is making concepts visible and ultimately that often
means making it named. With art, though, there’s some tension with engineering:
sometimes we do things to show instability, to test a limit, or to reveal the
tensions within our culture or systems we build. We can create a module of code
only to abandon it once it‘s served its cultural purpose — be it connecting two
things together, mashup style, or just moving on because there’s a better way
to do things.

One of the interesting differences about software artifacts as a medium of art
contrasted with other fine arts is that despite working in a very definite
medium, though abstract, much of what we make is never finished. It exists in
our culture — yes, software creation is a reflection of our culture, and a
culture of its own — and as web workers, especially working as artists, a lot
of what we create straddles the lines of engineering, fine art, performance
art, and craft.

Sometimes, too, a destructive act can be an artisic act: in the unpublishing of
left-pad, Azer Bike revealed that some of us have been choosing dependencies
with little thought, and revealed just how interdependent we are with each
other when we work in the open and rely on each other’s software.

So it goes that even the biggest pieces of software are made up of smaller
parts. It’s the natural way to approach problems and make them solvable. A
large module is nothing more a collective name for little modules that may not
have their full names and final forms.

I really like small modules as a norm, because I think of things in terms of
named objects. I’m happy to abandon a thing I think no longer suits, and it’s
easier to abandon a small module than a big one. They approach done, so I’m
happy to use a three year old tiny module, but a big project that’s three years
unmaintained is likely to be bug-ridden and poorly integrated.

Back to the thesis here:

What if we make everything a module?

What happens when we break off pieces and name them well? What happens when we
do and when we can, publish them, share those names and let others wield that
power over them? What does this do to our culture as programmers?

Practical approach to building modularity

I talk about npm a lot, but this can be extended: open your mind and projects
and think about making interfaces around new things.

It’s quite possible to take the module system that node uses and extend it to
new contexts, and as we’ve seen with projects like browserify, it’s possible to
keep the same abstract interface but package things up in new ways for contexts
they were not designed for.

Modularizing CSS

When I started at npm we had a monolith of old CSS built up, like most web
projects start accreting – a lot of styles we weren’t positive were unused, a
lot of pieces and parts that depended on each other. Since then and with a huge
shout-out to Nicole Sullivan and her huge body of work on this, in particular
go watch her talk or read “Our best practices are killing
us”,
we’ve started tearing apart the site and rebuilding it with, get this, modules
of CSS, with defined interfaces between them and the rest of the system.

They all have names — package names — and versions. So we’ll have a module like
@npmcorp/pui-css-tables (PUI is because we forked this system from a
component system used at Pivotal Labs)

In this case we’re using a tool called dr-frankenstyle. It’s pretty simple.
It looks at all the node modules installed in our web project and then
concatenates any CSS they export with a style property in the package metadata,
in dependency order.

This means our CSS actually has dependencies annotated into it, in the package
metadata, and it’s in pieces and parts. Because of this, and because it’s
named, we can start grappling with these things individually, and start making
sense of what otherwise becomes a huge mess.

There’s another project called atomify-css that can do similar things, and
both of these systems will do one set of fixups as they build a final CSS
stylesheet: they identify assets that those stylesheets refer to, and copy
those over and adjust the path to work in the new context. Atomify in
particular has modules for several languages that bring this style of name
resolution.

This turns out to be super powerful, because now it leads us into wanting to
modularize and make explicit the dependencies between all the things.

Now, CSS has some pitfalls: browsers still see all of a page’s CSS as a single
namespace, a single heap of rules to apply. This isn’t a clean interface, so
modularizing everything doesn’t automatically solve all your problems. It can
give us some new tools though.

Modularizing SVG

<svg><usexlink:href="./wonky.svg#camera-lens"/></svg>

What happens if we put SVG files into packages? What’s the interface to an SVG? The text? Parsed XML? Just the file name?

We’ve got dependencies. SVG files can load other SVG files. xlink attributes
could be followed, and postprocessing tools could inline those, making
production-ready and browser-ready SVGs from more modular ones.

Now that we have HTML5 support in browsers, we can embed SVG directly into HTML, too.

That brings me to…

Modularizing Templates & Helpers

We don’t just build raw HTML anymore, but we use templating systems to break
those apart. What if we published those as packages for others to consume, what
if we made them modules?

In the process of reworking npm’s website, we had components of CSS whose
interface is the HTML required to invoke them: A media block has a left or
right bit of image, and then some text alongside. A more sophisticated
component might string several of those together, add a heading banner and some
icons. The HTML required was getting complex and fragile, so that any change to
the component would require the consumer to update the HTML to match. Icons
were inlined, so changing an icon would mean editing a large blob of SVG.

In semantic versioning terms, every change became a major. While integers are
free, time to check what’s needed to update isn’t, so this wasn’t going to be a
scalable approach.

We started moving the handlebars templates we use that have the HTML to invoke
a component on our website into the modules. This moved the interface boundary
into something more stable. Now we can change what that component needs and
does as the design evolves without having to go propagate those changes to an
unknown number of callers.

You’ll remember I mentioned SVG icons. It turns out that inlining small icons
is one of the most efficient ways to use them, but it doesn’t scale very well
in the development process. The alternatives, icon fonts, require a lot of
infrastructure and are brittle enough that it stifles the act of moving things
into a module. Icons have to be in large groups with that approach, and that
trends toward very large modules, and probably to less efficient ways to do
things.

What I ended up doing was making a small handlebars helper like the
fromPackage helper I just showed the call to, and made a couple helpers for
loading SVG from packages. Called from our handlebars templates, a single
helper invocation can load and parse SVG from a package, do simple
optimizations, and cache the result, and inline it. SVGs, too, then, became
modules we can publish separately or in small groups.

A bit of an aside:

React Changed Everything

There is a reason that React talks have been so popular for the last couple
years. It really did make a radical shift in how we design frameworks, and more
importantly to me, it helped give components better interfaces. Stateless
components have well defined inputs and outputs. Side effects are reduced or
eliminated. This means modules can more easily declare their dependencies and
give simple interfaces.

This also means that React components fit into packages really neatly, and
automatically give an interface that’s like a function call. If you’re a react
programmer, you’ll probably recognize my fromPackage helper as very similar
to node’s require, which is how most of us use React these days, as webpacked
or browerified modules.

What can we steal from React?

That modularity and clear boundary on interfaces changed so much. Let’s
re-think how we integrate things to have interfaces that simple and clean.
There’s been a lot of experiments, too, on having react components
automatically namespace CSS they require, and then emitting HTML that uses the
namespaced version. By moving the module boundary from the raw CSS to something
that gets called, an active process, CSS namespacing woes can be solved by
separating what the humans type from what the browser interprets a little bit.

How radically can we change the complexity of an API by changing what kind of
thing we export?

What else can be a module?

At PayPal, I did work to make translations be separately loadable things, which
leads rapidly into separating those pieces into entirely separate packages with
their own maintenance lifecycle. When you have a separate team working on
something, having a clean boundary can be a great way to let work progress at a
more independent pace. What else can we modularize?

That last one is really interesting. Kyle Mitchell is a lawyer who uses a lot
of software in his work to draft legal texts. In so doing, he’s published a lot
of tiny modules of interesting stuff. Mostly they’re JSON files, cleanly
licensed and versioned, or small tools for assembling legal language out of
smaller pieces, re-using tested and tried phrasings of things. Sounds familiar,
right?

Text itself can be a module with an interface, even if that interface is
concatenation of a bit of text.

We can even make modules that are nothing but known good configurations of
other modules, combined and tested.

Making Modularity

Hands on!

A lot of this is going to be specific to node, but I like node not just because
it’s JavaScript and I think JavaScript is a lot of fun, but because its
dependency model is actually pretty unique. That’s actually a lot of what drove
me to node in the first place.

The underappreciated feature of node modules is that they nest — this really
bugs windows users since their tools can’t deal with deep paths — and this
means that we can have a module get a known version of something, defined
entirely on its terms, which means that what a package depends on is either a
less-important part of or not even a part of the interface to a module. We
spend a lot less time wrangling which versions of things all have to be
available at once, and we can start putting dependencies behind the boundary
that a module defines.

Most of what I build is built on @substack’s resolve module

resolve.sync('@mad-science/luncheon-meats/baloney.svg')

give me the file baloney.svg from the @mad-science/luncheon-meats package

This is a really simple module that implements node’s module lookup strategy
for files that aren’t javascript. You can say “give me the file baloney.svg
from the @mad-science/luncheon-meats package”, and it will find it, no matter
where it got installed into the tree — remember node modules let you share
implementations if two things require compatible versions of a module — and we
name the file, in this case the actual interface of this hypothetical module is
to just read the file once you figure out where it is.

That’s our primitive building block. I like this one because it matches how
everything else in the runtime I use most works.

There’s another thing that’s common to do: Add a new field to package.json. Dr.
Frankenstyle uses the property style rather than main to say which file is
the entry point to the package. This means that modules can do dual-duty:
grouping different aspects of a thing together into a single component, rather
than making the caller assemble the pieces when the pieces all go together
anyway.

One of the things I ask when building interfaces in general, and module systems
in particular is “how many guarantees can we make?”

Dependencies isolated

Deduplication

Local paths are relative to the file

Single entry point

One of the guarantees I love most is that local paths in node modules are
relative to the file. This is one of the ways that make it possible to break
things into modules without breaking down the interface they had as a
monolithic unit. It really makes me sad that most templating languages don’t
maintain filenames deep enough into their internals to implement this. It’s
good for source mapping and it’s good for modularity.

A lot of people fought this – they keep fighting this in node development, but
I think it exposes how people think about modularity: This is a symptom of
making the whole project a single module and giving the components just enough
name to navigate but not enough that they can live on their own.

I keep building similar models. I often make a path rewrites for resources, so
that things relative to the file where it’s actually stored will work when
loaded into a new context where it’s used. Sometimes that’s inlining. Sometimes
that’s copying modules into a destination and making sure their assets come
along for the ride.

This is replicating the guarantees of node’s module system, because they give
me some flexibility and durability in what I make. If things have their own
namespaces, their own dependencies, then I can break them less often or not at
all.

Going Meta

If everything’s a module, what can we do with that?

Have we simplified things enough to start giving our programs the vocabulary to
start extending themselves? Can we start talking about constructing programs
out of larger building blocks, even if they’re sometimes special purpose?

Can modules or remotely loaded packages be first-class objects in our programs?

What about generating new modules in the course of using our programs, and
letting our users share them?

What other kinds of interfaces can we give. Web services? Data sets with
guarantees about how they’ll develop in the future?

How radically can we simplify the interface of something?

One of the most influential concepts in my career was that phrase a lot of us
have heard about UNIX: everything is a file. Now, that’s a damn lie. There’s a
lot of things in unix that aren’t files at all. IPC. System calls. Locks. Lots
of things can be file descriptors, like sockets, but if you want to see more
things shoehorned into that interface, you could go install Plan 9, but there’s
not very much software out there for Plan 9.

Even so, UNIX took off in a huge way thanks to a bunch of factors, and even
Plan 9 and Inferno and systems derived from it have this really outsized
longevity in our minds because of one thing: They simplfied their interfaces.
Radically.

They defined their interfaces so simply that you can sum them up in a few words.

“Text file. Delimited by colons.”

“Line delimited log entries.”

“Just a plain file, a sequence of bytes”

These are super durable primitives. They had all their edge cases shaved off.
No record sizing to write a file no special knowledge of what bytes were
allowed or not. Very few things imbued with special meaning.

This means these systems last because they give us building blocks to build
better things out of.

I love to pick on unix because for all its ancient cruft there’s an elegant
system inside. It’s not the only super simple interface that really took off
either.

Chris Neukirchen made the rack library for ruby, and little did they know but
it suddenly got adopted by all the frameworks and all the servers because at
the core of it, a web request got simplified down to a single function call:
environment in, response code, headers, and body out. It’s adapted from the
Python WSGI but it was a great distillation of the concepts.

node modules also have this ridiculously simple interface. They get wrapped in
a function with five parameters, and are provided a place to put their exports
and a require function for their own use. It turned out to be a great thing to
build even more complicated module systems out of.

node streams, too. By making them pretty generic, it turns out that thousands
and thousands of packages all use the shared interface and all work together.

My first objection to these is that commits are not always past tense. In a world of CVS and Subversion, they are: reworking and recommitting things is far too much work, but this is git. They are not just a record of what we did, but they are actual objects that we are going to talk about, they are proposals and often they are speculative. git is an editor.

It doesn’t feel particularly natural to be more descriptive here because we’re basically adding labels to a timeline. If we do get descriptive here, it’ll be as sentence fragments awkwardly broken up into bullet lists at best, and talking more about what we did than why we did it. Let’s talk about them in the present tense:

fixes bug in display codecase where display list is null

or

improves caching behavior for edge casesometimes we write the empty entry first

A step in the right direction. Those start looking like objects we are going to talk about. However, they don’t make a lot of sense without context. Commits come with only two pieces of context: their parent commit, and the tree state they refer to.

These messages assume context in a way that leads to spelunking in the history later will not necessarily find. fixes bug implies there was a bug to fix, but not much about it. We still are talking more about history than about what we changed. One has to compare the states before and after, and there’s not a lot of incentive in this format to continue and describe the bug. The context is assumed. In talking about these commits, we’d say things like “this commit deadbeef was the problem”. We don’t really refer to the commit so much as the state it brings, and even then only weakly, in the form of what’s different about that state from previous, not what it is.

We can describe a little more but we’re still describing what we’re doing and not the state of the world.

In a world where we may rebase them, move them around and combine them, something a little more durable needs to happen. Let’s treat commit titles as names.

improvement in caching behavior for edge casea check to skip writing empty entries in the cache, preventing the casewhere empty entries would be returned instead of a cache miss.

Now the description we’ve left out starts feeling obvious. Now I want to know more about this bug, I want to know more about the fix, and I want to know about this improvement. These are nouns, and we have a lot of language for describing nouns.

These make sense even if rebased, and if we were to read the source code associated with this change, we would find that this describes the code added and removed, not the change from some unknown previous state. We know almost everything about the contents of this commit without having to infer it from context, and discussing it as the actual code becomes much easier. Code reviews can be improved, and we can refer to these commit hashes (or URLs) as objects and refer to them meaningfully later. “This improvement was very good”, or “this improvement introduced a bug”

Now we have objects to talk about, and detail about the state that differentiates it from other states, even without being directly attached to the history. With the need for context reduced, we can now use these commit messages in new contexts without rewording them. We add some tags with some machine-readable semantics: Tools like conventional-changelog-cli can generate change logs for summary to a user and semantic-release can bump version numbers in meaningful ways, dependent on the changes being released. We’ve pushed that decision out to the edges of the system, where all the context for doing it right lives. The result:

fix: improvement in caching behavior for edge casea check to skip writing empty entries in the cache, preventing the casewhere empty entries would be returned instead of a cache miss.BREAKING CHANGE: empty cache entries are not saved so negative cachingmust be handled in another layer.

And in changelog format:

v2.0.0 (2016-04-16)

fix: bug in display code 886a50c

fix: improvement in caching behavior for edge case 9bce4c5

BREAKING CHANGE

empty cache entries are not saved so negative caching must be handled in another layer.

This is super useful, but I think the context reducing style of commit message is a good prerequisite for actually getting good change logs that make sense.

A side note. I think github’s new squash and merge feature is going to be the perfect place for this style: individual commits are often not quite the right granularity for tagging. The style notes here apply otherwise, but tags I think are most useful on a merge-by-merge basis.

In the absence of squashing, a change to conventional-changelog that only looked at merge commits would be excellent, leaving the small state changes visible for code review, but the merges visible as external changes in the log.

In part one of this series I talk about Why MVC doesn’t fit the web
from the point of view of writing web services, in the vein of Ruby on Rails
and Express. This time I’m continuing that rant aimed at the modern GUI: The
Browser.

MVC originated from the same systems research that gave rise to Smalltalk,
which then had ideas imported into Ruby and Objective C that we use today. The
first mention of an MVC pattern that I’m aware of was part of the original
specifications for the Dynabook – a vision that has still not been realized in
full, but that laid out a fairly complete vision for what personal computing
could look like, a system that any user can modify and adjust. The software
industry owes a great deal to some of this visionary work, and many concepts we
take for granted today like object oriented programming came out of this
research and proposal.

The biggest part of the organizational pattern is that the model is the ‘pure
ideal’ of the thing at hand – one of the canonical examples is a CAD model for
an engineering drawing: the model represents the part in terms of inches and
parts and engineering terms, not pixels or voxels or more specific
representations used for display

The View classes read that model and display it. Its major components are in
terms of windows and displays and pixels or the actual primitives used to
display the model. In that canonical CAD application, a view would be a
rendered view, whether wire-frame or shaded or parts list data displayed from
that model.

The way the two talk is usually that the model emits an event saying that it
changed, and the view re-reads and re-displays. This lives on today in systems
like React, where the pure model, the ‘state’, when it updates, triggers the
view to redraw. It’s a very good pattern, and the directed flow from model to
view really helps keep the design of the system from turning into a
synchronization problem.

In a 1980’s CAD app, you might have a command-line that tells the model to add
a part, or maybe a mouse operating some pretty limited widgets on screen,
usually separate from the view window. Where there is interaction directly on
the view, the controller might look up in the view what part of the model got
clicked, but it’s very thin interface.

That’s classic MVC.

To sum up: separate the model logic that operates in terms of the business
domain, the actual point of the system, and don’t tie it to the specifics of
the view system. This leaves you with a flexible design where adding features
later that interpret that information differently is less difficult – imagine
adding printing or pen plotting to that CAD application if it were stored only
as render buffers!

Last we come to controllers. Controllers are the trickiest part, because We
Don’t Do That Anymore. There are vestigial bits of a pure controller in some
web frameworks, and certainly inside the browser. Individual elements like an
input or text area are most recognizable. The model is a simple string: the
contents of the field. The view is the binding to the display, the render
buffers and text rendering; the controller is the input binding – while the
field has focus, any keyboard input can be directed through something that is
written much like a classic controller, and updates the model at the position
in the associated state. In systems dealing with detached, not-on-screen
hardware input devices, there’s certainly a component that directs input into
the system. We see this with game controllers, and even the virtual controllers
on-screen on phones emulate this model, since the input is usually somewhat
detached from the view.

In modern web frameworks, you’ll find a recognizable model in most if not all.
Backbone did this, giving a structured base class to work from, since it is
commonly mapped to a REST API in the form of its Backbone.Model class.
Angular does this with the service layer, a pretty structured approach to
“model”. In a great many systems, the model is the ‘everything else’, the
actual system that you’re building a view on top of.

Views are usually templates, but often have binding code, read from the
model, format it, make some DOM elements (using the template) and substitute it
in, or do virtual DOM update tricks like React does. Backbone.View is an
actual class that can render templates or do any other DOM munging to display
its model, and can bind to change events in a Backbone.Model; React
components, too, are very much like the classic MVC View, in that they react to
model or state updates to propagate their display adaptation out to the
viewer.

The major difference from MVC comes in event handling. The DOM, in the large,
is deeply unfriendly to the concept of a controller. We have a lot of systems
that vaguely resemble one if you squint right: navigation control input and
initial state from the URL in a router; key bindings often look a lot like a
controller. To make a classic MVC Controller, though, input would have to be
routed to a central component that then updates models and configures views;
this split rarely exists cleanly in practice, and we end up with event handlers
all directly modifying model properties, which reflect their state outward into
views and templates.

We could wrap and layer things sufficiently to make such a system, but in the
guise of ideological purity, we would have lost any simplicity our system had
to begin with, and in the case of browsers and the web, we would be completely
ivorced from native browser behavior, reinventing everything, and losing any
ability to gracefully degrade without javascript.

We need – and have started to create – new patterns. Model-View-ViewModel,
Flux, Redux, routers, and functional-reactive approaches are all great ways to
consider structuring new applications. We’re deeply integrating interactivity,
elements and controls are not just clickable and controllable with a keyboard,
but with touch input, pen input, eye-tracking and gesture input. It’s time to
keep a critical eye on the patterns we develop and continue to have the
conversations about what patterns suit what applications.

Yesterday I wrote a small http client library starting from bare node.js
net module to make a TCP connection, and in a series of steps built up a
working client for a very basic dialect of HTTP/1.1. It’s built to have some
similar design decisions to node’s core http module, just for real-world
relatedness.

I did this as a teaching exercise for a friend – he watched me work it up in a
shared terminal window – but I think it’s interesting as a learning example.

For background, this relies on core concepts from node.js: streams, and
connecting to a TCP port with the net module; I’m not doing anything
complicated with net, and almost entirely ignoring errors.

I start almost every project with a similar start: git init
http-from-first-principles then cd !$ then npm init and mostly accept the
defaults. I usually set up the test script as a simple tap test.js. After the
init, I run npm install --save-dev tap. tap is my favorite test framework,
because it has pretty excellent diagnostics when a test fails, runs each test
file in a separate process so they can’t interfere with each other so easily,
and because you can just run the test file itself as a plain node script and
get reasonable output. There’s no magic in it. (the TAP protocol is pretty
neat, too)

Next, I created just enough to send a request and a test for it. The actual
HTTP protocol is simple. Not as simple as it once was, but here’s the essence
of it:

GET /a-thing-i-want HTTP/1.1Host: example.orgAccept: text/html

That’s enough to fetch a page called /a-thing-i-want from the server at
example.org. That’s the equivalent of http://example.org/a-thing-i-want in
the browser. There’s a lot more that could be added to a request – browsers
add the user-agent string, what language you prefer, and all kinds of
information. I’ve added the Accept: header to this demo, which is what we’d
send if we want to suggest that the server should send us HTML.

The server will respond in kind:

HTTP/1.1 200 OKContent-Type: text/htmlContent-Length: 35<p>this is the page you wanted</p>

That may not come in all at once – the Internet is full of weird computers
that are low on memory, and networks that can only send a bit at a time. Since
node.js has streams, we get things as they come in, and we have to assemble the
pieces ourselves. They come in in order, though, so it’s not too hard. It
does complicate making a protocol handler like this, but it does give us the
chance to make a very low-memory, efficient processor for HTTP messages.

So if we get that back all as a chunk as users of our HTTP library, that’s not
that useful – nobody wants to see the raw HTTP headers splattered at the top
of every web page. (Okay, I might. But only because I love seeing under the
hood. It’d be like having a transparent lock so you can see the workings. Not
so great for everyday use.)

A better interface would be to have headers come out as a javascript object,
separate from the body. That’s what is done in the next commit. Or at least
the interface is exposed – we don’t actually parse the headers yet. That’s
going to be trickier.

We got a complete header line, and the newline that ends the header section

We got a complete response all at once

We got a complete header, and part of the body

Having already received a part of a header, we get another part of a header

Having already received a part of a header, we get the remainder and more…

And so on. There’s a lot of ways things can and will be broken up depending on
where the packet boundaries fall in the stuff we care about. We have to handle
it all.

The best approach is starting at the beginning, and see if you have a complete
thing. If not, store it for later and wait for more. If you do have a complete
thing, process it, take that chunk off the beginning of what you’re processing,
and loop this process and see if there’s more. Repeat until complete or you
have an error. That’s what the first part of the header parser does.

That first pass at the problem was a little naive and doesn’t stop at the end
of the header properly. So next we put in a temporary hack to put that missing
chunk somewhere.

It’s everything I’d hoped it would be – I have excellent coworkers. The work is interesting. The business makes sense. The plans for the future are exciting, but not mind-bendingly ambitious.

Working on open source is familiar, and the workflow of using private repositories on GitHub is definitely smoother than bouncing between two separate instances using GitHub Enterprise. I think this is overlooked by corporate systems designers and security folks. Trading this ease out costs a lot in productivty and maintenance that I think is under-appreciated.

Being remote is imperfect. The tools for remote face to face meetings leave something to be desired, and doubly so with some hearing damage that makes it hard to understand words if there’s any interference or background noise. There’s still a lot of room out there for someone to get multi-party video conferencing right. Being remote from an office that has a majority of my coworkers colocated has some downsides, but my team and the company as a whole is gracious and thoughtful and caring, and that smooths over the vast majority of the rough edges.

The biggest difference is how much more processes make sense when everyone is involved and cares. So far, every decision has made sense, and it’s getting easier to trust that things are the way they are for a reason, and if they cause a problem can be changed. In comparison to a corporate bureaucracy who only occasionally manages to challenge its tendency to ossify, it’s a world of difference – without a tyrrany of structurelessness. In so many ways, npm is a traditionally structured company. A simple heirarchy of managers and reporting. Employees doing the work have the most visibility into that work, the executives have the most comprehensive ability to steer and direct, but rely on us for the insight into the details. No special organization to teams – grouped by project, people allocated according to company goals. All of this though, has an element of trust that I’ve not seen since I worked at Wondermill in 2001. People genuinely like each other, support each other, and go out of their way to make sure things work for each other. In so many ways: it feels like working with a net. A proper safety net, not something rigged up to be good enough at the moment but precarious to trust long term.

Git at first seems to be an ideal tool for deploying web sites and other things that don’t have object code. However, it’s never been that simple, and where there’s programming, there’s automating the tedious bits and creating derivative pieces from more humane sources.

With the addition of receive.denyCurrentBranch = updateInstead in git 2.3.0, possibilities opened up for really reliable, simple workflows. They’ve since been refined, with a push-to-checkout hook allowing built objects to be created on the receiving server, but I want a more verifiable, local approach.

There are two main strategies in git for dealing with this, and before git 2.3.0, those were really the only things available. In the first, git holds only the source material, and any built products are managed outside of git, whether as a directory of numbered tarballs or in a service meant for such things. Some services like the npm registry bring a lot of value, with public access and hosting and replication available; some are little more than object storage like Amazon S3. In the second approach, built products are committed back, and git becomes a dumb content tracker – conflicts in built files are resolved by regenerating them from merged source material, and the build process becomes integral to every operation on the tree of files.

I’ve long wanted a third way, using the branching, fast, and stable infrastructure of git, while keeping the strict separation of source material and built material. I want to be able to inspect what will be deployed, and inspect the differences between what was deployed each time, and separately, analyze the changes to the source material, yet still be able to relate it to the deployed, built objects. To that end, this tool can be considered a first attempt at building tools that understand the idea of a branch derived from another.

The design is simple enough: given a branch (say master) checked out in your repository, with a build process for whatever objects need to exist in the final form, but those products ignored by a .gitignore file, like so:

source.txt:

aGVsbG8sIHdvcmxkCg==

and a build script:

build.sh:

#!/bin/shbase64 -D < source.txt > built.txt

and an ignore file, with both the built object and other things like editor cruft:

.gitignore:

built.txt*.swp*~

we create a file listing the files to skip excluding when creating the derived branch, like so:

.gitdeploy:

built.txt

The initial version of the tool is very simple, and doesn’t support wildcards or any other features of any complexity in the .gitdeploy file. This is not out of a strong opinion, but as a matter of implementation simplicity, given that my prototype is written using bash.

You can install it with npm:

npm install -g git-create-deploy-branch

To create the deploy branch, we’ll run the build, then create the deploy branch with those objects present in our working directory:

./build.sh && git create-deploy-branch

Our first run gives output like so:

[new branch] 8acba8787306 deploy/master

and a branch deploy/master is created, in this case with commit ID 8acba8787306. We can show that it includes the built files:

The commit also has the parent commit set to the current commit on master, so we can track the divergence between master and deploy/master, both expected (with the built objects) and unexpected (errant commits made on the deploy branch).

One of the most frustrating things that happens in a large node.js application is a double callback bug. They’re usually simple mistakes that are super tricky to track down. You may have seen one and not recognized it as such. In Express, one manifestation is Error: Can't set headers after they are sent; another one I’ve seen is an EventEmitter with an error event handler registered with ee.once('error', handler) that crashes the process saying it has an unhandled error – the first callback fires the error handler, the second triggers another error and since it was bound with once, it crashes. Sometimes they’re heisenbugs, where one path through a race condition resolves successfully, but another will manifest a crash or strange behavior.

This version works more acceptably if fs.readFile gives us an error. Now let’s consider what happens when there’s a JSON parse error: This crashes, since an exception thrown by JSON.parse will unwind up the stack back to fs.readFile‘s handler in the event loop, which has no try/catch and will crash your process with an uncaughtException. Let’s add an exception handler.

Whoops. That last line throws TypeError: Cannot read property 'thing' of undefined.

That goes back to the callback function and the try/catch block, and we call back again with the error. Our callback gets called twice – which isn’t so bad with things that don’t care like console.log and console.warn, but even then, the output is confusing:

It both worked and didn’t work! That’d crash our program if something throws an exception for a double callback. It’ll eat the error and we’d wonder why our program was misbehaving if the thing we’re calling ignored second callbacks.

We’ve also made a tricky conundrum here. There’s a lot of ways to solve it, from the ignoring multiple callbacks like so: (this example uses the once module)

This means that the caller of readJsonAsync is on their own to handle their exceptions. No warranties, if it breaks, they get to keep both pieces, et cetera. But there’s no double callbacks!

So this gets tricky when you have a whole chain of things – someone’s made a mistake in something “so simple it can’t go wrong!” like a readFile callback that parses JSON, but the double callback comes out miles away, in a callback to something in a callback to something in a callback to something in a callback that calls readJsonAsync. This isn’t an uncommon scenario – every Express middleware is a callback, every call to next calls another. Every composed callback-calling function is another layer. The distance can get pretty severe sometimes. This is one of the less-loved benefits of promises: errors are much more isolated there, and the error passing is much more explicit. I think it’s a more important point than a lot of things about promises. But that’s neither here nor there. What we’re asking is:

How do we debug doubled callbacks?!

My favorite way is to write a function that will track a double callback and log the stack trace of both paths. This is a bit like the once package, but with error logging.

Here’s a simple version.

functionjustOnceLogIfTwice(cb) {var last;returnfunction () { // return a new function wrapping the old one.if (!last) { last = newError(); // Save this for later in case we need it. cb.apply(this, arguments); // Call the original callback } else {var thisTime = newError("Called twice!");console.warn("Callback called twice! The first time is", last.stack, "and the next time is", thisTime.stack);// optionally, we might crash the program here if we want to be loud about errors. Like so: setImmediate(function () {// This is an "async throw" -- it can only be caught by error domains or the `uncaughtException` event on `process`.throw thisTime; }); } };}

Now we just have to trigger the error, and we should get two stack traces, once with the success path, and once with the error path.

Other ways? Set breakpoints on the calls to cb. See what the program state is at each of them.

Try to make a reproduction case. Good luck: it’s hard.

Add once wrappers to callbacks until you find the problem. Move them deeper and deeper until you find the actual source.

Give extra scrutiny to non-obvious error paths. If you can’t spot where errors go, I’d bet money on finding part or all of the bug in there.

Add an async-tracking stack trace module like long-stack-trace or longjohn. They slow your program down and can change the behavior because of the tricks they do to get long traces, but they can be invaluable if they don’t disturb things too much.

Consider using this eslint rule to catch the simpler cases – it won’t catch all of them, but it’ll at least catch the missing return case.

This is part announcement, part job advertisement, part musing on what it’s like to work with a really amazing team.

I’m leaving PayPal in the first week of August to join the fine people at npm, inc as the architect of the web site. It was actually one of the toughest decisions I’ve had to make, because while npm is the company I absolutely most want to work for, I really, really like my team at PayPal. I can’t think of any other company I’d leave my team for. They are kind, hard-working, honest, visionary but not obnoxiously opinionated. I’ve been given a huge amount of trust while I was there, and I’ve produced some great work. As one of my last acts for the team, I want to find someone to replace me.

For the past year, I’ve been working on KrakenJS at PayPal, doing largely open source development, and supporting application teams internally. The Kraken team is a unique team in a unique spot in the company. Our job is the open source project, advocacy for our internal developers, technological leadership, and creating common infrastructure when we can identify problems that multiple teams have. We do research and experiment with new technologies – both to vet them for stability, and to find places that will be error-prone and require caution or will impact long term maintenance.

I spent most of my year working on internationalization components. This wasn’t exactly assigned work – though someone really did need to do that work, so I jumped in and did it – but there’s a lot of things that need attention and the point of the project is to serve its users needs. It’s not there to enforce an opinion, just to solve problems, and so it does and we do. The team has worked a lot on rough consensus and running code. If someone has an idea, they prototype it and show it off to the team. Ownership is collective, but everyone takes responsibility.

Originally, Kraken was a prototyping tool used internally. The original team was taking a rough stab at some early componentizing and tooling for purely front-end work, but as time passed, the real niche showed up: an enterprise-friendly, structured but not too restrictive framework for making front-end services, first as a prototype for Java services that were not yet ready, and later, to replace those services with a node.js-based front tier. Application teams are now integrated, full-stack teams, building both in-browser and server-side components together. This has allowed a pretty unprecedented pace of development within PayPal, and in the past two and a half years, nearly every customer-facing application has been rewritten. That’s a huge amount of success enabled by the experimentation and resourcefulness of this small team. There are recordings of conference talks about this.

Recently, the team has been merged with some of the core node.js infrastructure team, now responsible for both internal architecture modules and the open source project. While the split loyalties to open source and to the internal company work are annoying, it actually works really well that way. PayPal is credibly the single largest enterprise use of node.js. I think we’ve got more developers using it than any other company, and certainly have based a large portion of our architecture on it. If someone’s having a problem with node, chances are we’ve seen the error and may well have found patterns or workarounds for development problems, and we work on getting bugs fixed upstream.

An example of one of the trickier bugs was diagnostics of a memory leak in io.js. You can see the back-and-forth with Fedor Indutny and my team on that issue, trying to diagnose what’s going on. Credit to Fedor: he knows the source of io.js better than anyone I know, particularly the TLS parts, and made tidy work of fixing it, but instrumenting, diagnosing and tracing that leak was a weeks-long process, starting in-house with monitoring noticing that a service running iojs behaved differently than the version running node 0.10 or 0.12. From there, making diagnostic framework to track what’s going on, and really digging in let us make a bug report of this caliber. Not every – or even many bugs involve that kind of to-the-metal investigation, but the team can figure out anything. They are great, kind, wonderful people.

It’s not all roses. There’s a lot of legacy baggage within the company, as any company that size and age is going to have. Enterprise constraints and organization have their own weight. Some people are resistant to change, and not every developer wants to do an amazing job in the company. Moving to new technologies and ways of doing things still require backward compatibility and migration paths, but having tools like semantic versioning and node.js’s module structure have helped a lot. Tools like Github Enterprise, Asana and Slack and HipChat have their roles in enabling this kind of change.

Today someone asked on the node.js mailing list why the URL that Express.js gave them to access their application had a port number in it, and if they could get rid of it (since other sites don’t have it.)

My explanation is this:

There are some interesting details to this!

Each service on the Internet has a port assigned to it by a group called IANA. http is port 80, ssh is 22, https is 443, xmpp is 5222 (and a few others, because it’s complicated), pop3 is 110 and imap is 143. If the service is running on its normal port, things don’t usually need to know the port because it can just assume the usual one. In http URLs, this lets us leave the port number out – http://example.org/ and http://example.org:80/ in theory identify the same thing. Some systems treat them as ‘different’ when comparing, but they access the same resource.

Now if you’re not on the default port, you have to specify – so Express apps in particular suggest you access http://localhost:8080/ (or 3000 – there’s a couple common ports for “this is an app fresh off of a generator, customize from here”). This is actually just a hint – usually they listen to more than localhost, and the report back for the URL is actually not very robust, but it works enough to get people off the ground while they learn to write web services.

If you run your app on port 80, you won’t need that.

However!

Unix systems restrict ports under 1024 as reserved for the system – a simple enough restriction to keep a user from starting up something in place of a system service at startup time, in the era of shared systems. That means you have to run something as root to bind port 80, unless you use special tools. There’s one called authbind that lets you bind a privileged port (found most commonly on Debian-derived Linuxes), one can call process.setuid and process.setgid to relinquish root privilege after binding (a common tactic in classic unix systems), though there’s some fiddly details there that could leave you exposed if someone manages to inject executable code into what you’re running. And finally, one can proxy from a ‘trusted’ system daemon to your app on some arbitrary port – nginx is a popular choice for this, as are haproxy, stunnel and others.

Now as to why it’s just a hint: the problem of an app figuring out its own URL(s) is actually very hard, unsolvable often even in simple cases, given the myriad of things we do to networking – NAT and proxies in particular confuse this – and that there’s no requirement to be able to look up a hostname for an IP address, even if the hostname can be looked up to get the IP address. None of this matters for localhost though, which has a nice known name and a nice known IP and most people do development on their own computers, and so we can hand-wave all this complexity away until later, after someone has something up and running.

Temporal coupling is the reliance on a certain sequence of calls or checks to function, rather than having them explicitly called in order in a function. “this, then this, then this have to be called before the state you look at here will be present” is how it works out.

That’s reasonably well guarded, because it checks that it’s not there, and sets one up if it’s not already there. But if it was cached previously, and so already set, we’re now dependent on that state, which could have been set in an entirely different way. The only thing that saves us is that the cache is pretty well private.

So now we’ve got temporal coupling between the view’s constructor setting an instance variable and our calling code. This error check is performed synchronously after the construction of the object, which is sad, because that coupling means that any asynchronous looking up of that path is now not available to us without hackery. This is exactly what’s being introduced in Express 5, and so this calling code has to be decoupled.

This is a minor case of temporal coupling, but those pieces of Express know way too much about each other, in ways that make refactoring it more invasive.

There’s a sort of style of programming where the inner components are written first, then the outer ones are written assuming the inner ones are append-only that I think leads to this, a sort of one-way coupling.

So now the render method is only safe to call if this.path is set, and we’re temporally coupled to this sequence:

new View(args);if (view.path) { view.render(renderArgs)}

Without that sequence – instantiate, check for errors, render if good or error if not – it’ll explode, having never validated that this.path is set..

It’s okay to temporally couple to instantiation in general – it’s not like you can call a method without an instance, not sensibly – but to that error check being required by the outside caller? That’s a terrible convention, and the whole thing would be much better enveloped in a method that spans the whole process – and in this case, an asynchronous one, so that the I/O done validating that the path exists doesn’t have to be synchronous.

So to fix this case, what I would do is to refactor the render method to include all the checks – move the error handling out of the caller, into render or something called by it. In this case, the lookup method is a prime candidate, since it’s what determines whether something exists, and the error concerns whether or not it exists.

synchronous code, and throw is usually limited to application logic, synchronous decisions being made from information already on hand. They can also arise from programmer error – accessing properties or functions of undefined are among the most common errors I see.

If you are calling a callback in an asychronous context provided by another module or user, it’s smart to guard these with try/catch blocks, and direct the error into your own error emission path.

This is important since a synchronous throw in an asynchronously called function ends up becoming the next kind of error:

asynchronous calls and throw will crash your process. If you’re using domains, then it will fall back to the domain error handler, but in both cases, this is either uncatchable – a try/catch block will have already exited the block before the call is made – or you are completely without context when you catch it, so you won’t be able to usefully clean up resources allocated during the request that eventually failed. The only hope is to catch it in a process.on('uncaughtException handler or domain handler, clean up what you can – close or delete temp files or undo whatever is being worked on – and crash a little more cleanly.

Anything meant to be called asynchronously should never throw. Instead, callbacks should be called with an error argument: callback(new Error("Error message here")); This makes the next kind of error,

asynchronous calls with an error parameter in the callback receive the error as a parameter – either as a separate callback for errors, or in node, much more commonly the “error first” style:

doThing(function (err, result) { // Handle err here if it's a thing, use result if not.});

This forces the programmer to handle or propagate the error at each stage.

The reason the error argument is first is so that it’s hard to ignore. If your first parameter is err and you don’t use it, you are likely to crash if you get an error, since you’ll only look at the success path.

With the iferr module, you can get promise-like short-circuiting of errors:

Using promises also gives this short-circuit error behavior, but you get the error out of the promise with the .catch method. In some implementations, if an error happens and you haven’t set up what happens to it, it will throw after a process tick. Similarly, event emitters with unhandled error events throw an exception. This leads to the fourth kind of error:

asynchronous event emitters or promises, and error handlers

An event emitter that can emit an error event should have a handler set up.

If you don’t do this, your process will crash or the domain handler will fire, and you should crash there. (Unless your promises don’t handle this case, in which case your error is lost and you never know it happened. Also not good.)