A few years ago, our engineers began migrating our site from Python to Node.js. The process is almost complete now and it’s been a good move for us.

Nigel Kibodeaux, one of our backend engineers, is here to share some of the lessons we’ve learned in the process.

Background

Moving from Python to Node has dramatically increased the number of requests we can handle per server and decreased the response time for individual requests. Using Node.js has the additional advantage of being able to share code between the front and back ends so we can do cool things like render pages on the client or server.

We got most of the kinks and pain points figured out pretty quickly, but the one thing we’ve been refining the longest is error handling.

For our web servers, we use the Express framework with the Sentry error handling middleware. We have Sentry set up to catch uncaught exceptions and also handle errors sent to next().

When an error is sent to next(), it gets logged in Sentry and the client receives an error message. In development, the client gets the stack trace in response to a request that errors, but in production the client would get a generic message like “Internal Server Error.” We have alerting set up in Slack for new errors and for errors that pass a certain frequency threshold.

We’ve been using this setup since we first started using Node and it’s continued to work well for us. Now that we have the stage set with our environment, let’s look at some pitfalls we’ve encountered.

1. Don’t ignore errors

This is the most obvious mistake you can make when it comes to errors, but we’ve made it a few times, especially when we were starting out with Node.

Let’s say that we have an endpoint that returns something like: {name: 'Bill', hobbies: 'tennis, swimming'} when you request it at localhost/user?user_id=123.

If there was a database connection error, for example, data would be undefined, data.hobbies.join(', ') would throw an exception and the whole process would crash.

Unless you’re testing for this error state (and ideally you would be in unit tests), you’re probably not going to notice this until it happens in production. If that happens, the request will hang forever and the user will be confused and annoyed.

Here’s a better way: We handle the potential error from the getUserData function and prevent .join() from blowing up if hobbies is not an array. This will send an error to the client and to our error handling middleware.

2. Use real errors

In the example above, I’m returning an object to the callback instead of an error. The problem with returning a regular object is that you won’t have a stack trace to help you find where the error originated.

We’ve done this a time or two and gotten the totally unhelpful error description of [object Object] in our error tracking system.

3. Wrap errors

Since we’re using real errors, they all have stack traces. Unfortunately if the error is passed around to a bunch of callbacks, the stack trace will tell you where the error is but not what code ran that led to the error. That’s a little abstract, so let’s talk about an example.

We’ve seen errors with a message like “invalid query” and a stack trace that begins and ends in some low level node module that we use to access our database. We had no idea which query was invalid and were only able to figure it out by correlating when the error showed up with what code we released at that time.

In the example above, you’ll just see the error returned from the third party module but not know where in your code you called that module. If you’re using thirdPartyModule in a lot of places in your code, it will be tricky to figure out which call is causing errors.

What you want to do in this situation is wrap the error before passing it up to the callback so your stack trace includes your application code. For this, we use the Contextualizer module like so:

4. Don’t swallow errors

The pitfalls I’ve discussed so far are pretty straightforward. A more subtle problem that we’ve seen is handling errors in a way that hides them from our users and ourselves.

Here’s an example of hiding an error from everyone. Let’s modify the example app and assume that every user must have a list of hobbies when they are created. Here’s an endpoint that will allow a user to email their list of hobbies to a friend (totally something I do in the real world frequently). Let’s assume you’ve coded this endpoint defensively like in the example above:

If hobbies is missing, everything still works. That’s great, but in our imaginary app hobbies are always supposed to be there. The app doesn’t crash but the user unwittingly emailed an empty list of hobbies to their friend. More insidious is that something must be broken elsewhere in the code that’s removing hobbies, but we won’t be alerted of that.

Now if hobbies is missing for some reason, there’s no crash but an error gets sent to next(), where the developers and the user will be made aware of it. Your error handling middleware should log the error and send some generic error to the user. Notice that I added the user_id to the error to help with troubleshooting.

5. Let the user know about errors

You might be questioning the wisdom of sending the error to the user in the previous example since you don’t want them to know you have bugs in your app!

This is where I think transparency is the best policy. Your users will eventually figure out that their hobby emails aren’t being sent and they’ll be unhappy. I think it’s better to let them know immediately so they can attempt to fix the issue and let you know if they can’t.

I’m not saying that you should always show your users a white “Internal Server Error” screen. That’s a reasonable fallback, but try to make your error messages as helpful and informative as possible. You might ask them to try again or let them know how to contact support.

6. When to log errors, when to pass them

At this point, we’ve covered the juiciest problems. Here’s one that’s maybe more of an annoyance than a problem, but let’s talk about it anyway.

In our early node code we would sometimes log the same error multiple times. It wasn’t horrible, but it made sorting through the errors harder. Here’s an example of that happening:

At first, we would tend to log errors to sentry whenever they showed up. We log the error to Sentry in the getStuff function, then pass it to the callback. The route handling function receives the error and sends it to next which will log it to sentry and respond to the client. So the error ends up getting logged to Sentry twice.

This is easy enough to avoid, the rule is don’t log an error if you’re going to pass it to the callback. Log an error only if it’s recoverable and you proceed in spite of it.

The end

I hope this was helpful and that it might save you from some of the bumps we’ve experienced. We’ve got one more error strategy that has to do with user-facing errors, so stay tuned. Thanks for reading!