Building Robust Node Applications: Error Handling

I’ve crashed more Node processes in production than I’d like to admit. Thankfully, I’ve then learned how to build robustness into my Node applications. So, what can you build into your applications to keep yourself informed of errors and ultimately keep your applications running?

“Robustness” encompasses many aspects of application development like handling untrusted inputs or gracefully rolling-back state in response to an unexpected failure. In this article, we will focus on robustness in terms of keeping a Node application from crashing and building structures to handle and stay informed about errors. Specifically, we will look at application level errors, which is where most issues happen.

Before we look more into handling these errors, let’s first step back and look at why applications terminate and how to be notified in order to address “crashers” (bugs that take down a Node process).

The uncaught exception

A Node process will terminate on any uncaught exception (explicit or implicit). This may be acceptable and preferable for short-running scripts; it is troublesome if you intend to keep the process running for a while (as you would in web servers, watch scripts, proxies, etc.). Node chooses to terminate the process because it’s likely in an unstable state and may have leaked connections, files, and other I/O.

However, you can override this behavior by adding an uncaughtException handler on the process object. The following illustrates programmatically what Node does on your behalf when an uncaught exception occurs:

An uncaughtException handler should be treated as a last opportunity to say your goodbyes before calling process.exit. It is not advised to keep the process running.

Why exit from an uncaughtException? An uncaughtException is an event handler triggered away from the original source of the exception. All you receive back is the stack trace of the originating error. Most likely you have no reference back to the source objects surrounding the error to do damage control (cleaning up state or I/O). It’s best to just exit and have your servicemanager/monitor restart the process.

I will typically add an uncaughtException handler to send me an alert with the stack trace and other pertinent process information before shutting down. Receiving notifications for crashers is incredibly important in addressing issues quickly in production.

Ensure you are in production as you don’t want to get a ton of emails while in development.

Log the stack trace.

Email the error with the stack trace.

Exit the process.

Ultimately, the goal is to get as few of these emails as possible. Yet, they are great way to alert yourself of anything that is taking down your Node process.

The infamous ‘error’ event

Even after you receive a lovely email with a stack trace, you can get exceptions that are incredibly vague and can have absolutely no clue where they come from. Many of these originate from unhandled ‘error’ events. Node treats this as a special event. If left unhandled, it will throw an exception (instead of silently ignoring the error).

Many 3rd-party modules will bubble up ‘error’ events or other errors from Node core modules as well as emit their own. For example, the redis module may emit ‘error’ events triggered by the underlying core net module:

console.trace() is a handy way to let yourself know where you are in the code. Here we also labeled the stack trace for more context.

Handle and log the error.

I’ve found out (the hard way) that even if I think it is unlikely for an ‘error’ event to occur, it may happen at some point (usually in production). The ones I’m most tempted to leave out are pipe operations, since it looks so pretty to just say:

input.pipe(output)

However, a more robust approach adds error handling on both the input and output streams:

If an ‘error’ event is emitted by any EventEmitter while inside the domain, log it here.

Run a section of code inside the domain.

Here, if either the input or output streams were to have an error (like the file not existing), the domain would capture the error in one spot. This may or may not be what you want. Sometimes, it is helpful to recover from errors individually.

With EventEmitters and domains you must explicitly add any EventEmitters to the domain if they were created outside the domain. You can add using d.add(eventEmitter).

Catching implicit exceptions

We can’t forget about common implicit exceptions. A great example of this is the SyntaxError thrown when using JSON.parse.

The error argument

Node follows a convention for callbacks. I like the term nodeback: a callback function that receives an error as its first argument. If an error occurs in the asynchronous operation, the error object will be populated. Otherwise, it will be null.

A Node process will never crash directly because of an error argument, as it’s just an argument. However, it can easily cause implicit exceptions down the road if not handled.

Admittedly, I’ve neglected the error argument to my own peril and have come to see it as unwise to ignore it. Now, I see how important Node treats unhandled ‘error’ events and I try to treat the error argument just as important.

For example, we can assume we will always get a file buffer back from fs.readFile, but if we assume this we may crash our server due to a ReferenceError: