node-report ‚Äď first failure data capture for Node.js

A summary diagnostic report for Node.js was introduced in Future directions for diagnostics in Node.js production environments as one of the contributions that the IBM Runtimes development team are focusing on in 2016. This article describes the diagnostic report in more detail, and shows how it can be used as a primary diagnostic for Node.js applications.

First failure data capture (FFDC) covers a range of techniques used in software systems to allow rapid diagnosis of problems with minimal impact to application availability and performance. Techniques used in FFDC include logging of warning and error messages, production of dumps and always-on tracing/flight recorder. The aim is to capture sufficient data on the first incidence of a failure to diagnose and fix a problem without having to re-produce it. Requirements of any solution are that the overhead of error detection and data collection must be low, so that it can be used in production, and that code running after the point of failure needs to be fail-safe itself.

The node-report module produces a summary dump, written as a text file, containing key information about the JavaScript application and the Node.js runtime. Content includes JavaScript and C++ stacks, V8 heap statistics, per-process and per-thread CPU usage, environment variables and resource limits. It can be triggered on unhandled exceptions, fatal errors, signals and by calling a JavaScript API. The runtime overhead of enabling the report is very low. Existing call-backs in Node.js are used to intercept exceptions and fatal errors. There is an additional thread for handling signals, which remains idle unless a report is triggered. The example report shown below is 12Kbytes in size and was written to disk in less than 10 msecs. The implementation is in C++ and can run when the JavaScript environment is no longer available, for example in JavaScript heap out-of-memory conditions.

Installing and using the node-report npm module

The node-report npm module is available for Node.js v4, v6 and v7 on Linux, MacOS, Windows and AIX. It can be installed as follows:

npm install node-report

Using a require() call for node-report in your JavaScript application will configure a report to be triggered automatically on unhandled exceptions, fatal error events (for example out of memory errors), and signals (Unix and Mac OS platforms). A report can also be triggered via an API call from your application:

var nodereport = require('node-report');
nodereport.triggerReport();

Content of the report consists of a header section containing the event type, date, time, PID and Node version, sections containing JavaScript and native stack traces, a section containing V8 heap information, a section containing libuv handle information and an OS platform information section showing CPU and memory usage and system limits. An example report can be triggered using the Node.js REPL:

When a report is triggered, start and end messages are issued to stderr and the filename of the report is returned to the caller. The default filename includes the date, time, PID and a sequence number. Alternatively, a filename can be specified as a parameter on the triggerReport() call.

nodereport.triggerReport('myReportName');

To see examples of reports generated from these events you can run the demonstration applications provided in the node-report github repo. These are Node.js applications which will prompt you to access via a browser to trigger the required event.

Configuration on module initialisation is also available via environment variables. Configuration via the environment variables is actioned when the module is initialized, i.e. when the ‚Äėrequire(‚Äėnode-report‚Äô) statement is processed. The configuration can be altered subsequently via the JavaScript API calls above.

Implementation internals

The node-report module has two internal components, both written in C++, including some platform-specific code for Linux, MacOS and Windows:

A module control component processes configuration options and sets up the interception of exception and error events and signals. It also provides the JavaScript NAN APIs for configuring and triggering reports. The V8 SetFatalErrorHandler() API is used to intercept fatal error events. The V8 –abort-on-uncaught-exception option and the SetAbortOnUncaughtExceptionCallback() API are used to intercept uncaught exceptions.
Signal handling is implemented using a semaphore to hand-off to a watchdog thread, which then schedules the report on the main Node.js event loop. A combination of the V8 RequestInterrupt() and uv_async_send() APIs are used to schedule the report when there is JavaScript code running or when the event loop is idle.

A report writing component constructs the filename, then writes the various sections in the report. V8 Message::PrintCurrentStackTrace(), GetStackSample() and StackTrace::CurrentStackTrace() APIs are used to obtain and print JavaScript stack traces. V8 GetHeapStatistics() and GetHeapSpaceStatistics() APIs are used to obtain V8 heap information. Platform-specific APIs are used to obtain process information, native stack traces, environment variables and resource limits.

Example report for Out of Memory error in a JavaScript Express application

In this example the Node.js instance has terminated because of a memory allocation failure. The report has identified the problem as a failure to allocate space in the JavaScript heap, and shows the JavaScript and C++ stack traces of the code that was running when the failure occurred. It also shows that approximately 51Mb of memory has been allocated to the V8 heap old space region, with less than 1Mb of available space. The environment variables listed in the system information section of the report show that the –max-old-space-size option was set to 64 Mb.

[…] summary diagnostic report for Node.js was introduced in NodeReport ‚Äď first failure data capture for Node.js. The report can be triggered on unhandled exceptions, fatal errors, signals and by calling […]