Identify website availability issues with Real User Monitoring

Knowing whether your website is available and reachable for users is one of the primary reasons to monitor a website. To track availability and reliability website and application owners rely on a range of solutions such as synthetic monitoring, real user monitoring (RUM), and application performance monitoring (APM). Synthetic monitoring can find broad availability issues, APM can show errors if a client reached the server. But what happens if a regional or micro-outage occurs and real users are unable to reach your site? How can you monitor all the errors received by end users?

Synthetic monitoring solutions currently provide a more reliable way to detect reachability and availability issues preventing a user from accessing a website or service. Testing applications from globally distributed points of presence can alert you when problems start occurring. Synthetic allows for performance and availability measurement from predetermined locations by simulating users. It does this well since we have full control over the machine simulating users allowing us to get a complete set of data for performance and any errors including those resulting in availability issues.

But synthetic monitoring doesn’t show you how real end users are experiencing the application. This is where RUM comes in. Real user monitoring captures data from real end users revealing regional differences in performance. Detecting changes in pageviews from established baselines can help you approximate availability changes.

A combination of these monitoring solutions has been necessary as no single method can measure real-time performance and availability data directly from all end users.

For example, using just APM to measure errors from the application running on the server, an application owner has limited or no ways to detect reachability problems such as DNS failures, TCP timeouts, or sometimes even server errors on the CDN. Since these types of errors can’t be detected server-side, synthetic monitoring is used to simulate end users via nodes in predetermined geographic locations. Synthetic monitoring is an effective way to monitor from where end users are proactively, but it does not give a complete view of the experience of all users. RUM gives deep insight into the response time of end users, but availability metrics have not been available.

If a user can’t connect to a website, the page won’t load in the browser, the browser won’t be able to execute the JavaScript RUM code, and data collection is impossible. There has been no way to measure real end user availability, until now.

Detect website availability issues with Site Reachability Diagnostics

A new method for reporting these errors, called Network Error Logging (NEL), is now available in Chrome 69. For the first time availability can be measured from real users. The first time a user visits a website registers, the site registers that it wants to report errors to Catchpoint. On future visits, if the user fails to load a site, those failures will be reported. Problems resolving DNS, TCP failures, page load abandonment, and 400/500 HTTP errors received by the browser are now trackable.

Site reachability diagnostics with Catchpoint RUM accepts NEL data and enriches it with additional information about all end users including geolocations with city-level granularity, ISPs, browsers, operating systems, device models, and much more to help understand the full scope of user experience highlighting critical information about both positive and negative user experiences.

Without using NEL, Catchpoint customers can detect changes in website availability with Catchpoint’s unique Outage Analyzer which establishes baseline statistical models of traffic and notifies when traffic decreases from that baseline. Now, by sending NEL data to Catchpoint, customers will be able to go further by pinpointing why any given user is unable to reach the site and reveal previously unknown trends of missed opportunities. This vital data helps you understand where errors are occurring and how they can be addressed thus improving DNS routing, network management, and application availability.

Implementing NEL requires no code additions to the application and no agent deployments, all that is required is a response header. You can capture errors from many servers including CDNs with minimal effort. Simply configure the application and/or CDN to send errors to Catchpoint by including the NEL policy HTTP response headers. Our intelligent logging system will record the errors making them available for reporting and alerting.

True to our mission, to help companies deliver the best digital experiences, we work tirelessly to bring to market the most complete set of solutions to satisfy the needs of the new customer-centric IT organizations. We are very honored to have worked with the team at Google to implement this new standard in our Real User Monitoring. We hope other browsers will soon implement this standard.