A First Look at Opserver, Stack Exchange's Monitoring Solution

Opserver is an open source monitoring solution, released by Stack Exchange, of Stack Overflow's fame. Unusually in the monitoring tool's space, is is built on top of the .Net Framework.

Opserver aims to provide a quick overall view of each monitored system's health, but allowing the user to deep dive using a drill-down approach. As Nick Craver, one of Opserver's creator told InfoQ:

We believe a monitoring system should show you systems at a high level, present what’s wrong, and allow you to drill in for more detail.

Opserver is organized around web dashboards, each one specialized on a given system. Opserver currently supports SQL Server, ElasticSearch, HAProxy, StackExchange.Exceptional and Redis. It also uses SolarWinds' Orion, a commercial tool, to provide infrastructure and network monitoring. An Opserver installation does not require using all these systems, as they can be configured on an opt-in basis.

Taking SQL Server as an example, Opserver provides high-level information on CPU and memory consumption or the overall health of databases:

(Click on the image to enlarge it)

Below the 10,000 feet view, Opserver provides additional data. For instance, it provides a list of the top queries, sorted by multiple criteria (total duration, average CPU consumption). For each query, it provides more detailed information, including its query execution plan (a detailed breakdown of the steps taken to execute the query).

(Click on the image to enlarge it)

There are a few steps to take in order to setup Opserver. Besides github's Opserver readme file, someusers have described their setup experiences. In a nutshell, the code must be cloned from github, compiled and published on an IIS server. There is also the need to perform some configurations, of which there are two types: security settings and system's settings. Opserver provides examples for each settings definition, based on the ones used on Stack Exchange itself. These examples can be found at <site root>\Config.

The SecuritySettings.config file is the place where items such as the authentication methods are defined:

If Opserver does not cover a given scenario, there are some extensibility points to augment the tool with additional dashboards and configuration options. There are plans to make this process easier and more powerful in the future:

The biggest upcoming change as time allows is putting in a plugin model. People will be able to add tabs, views, pollers etc. that others can use. For example you could put a MongoDB monitoring tab up top with any level of detail you want inside.

The team also has other goals in the tool's roadmap:

It’ll also integrate heavily with our monitoring solution, keeping data history and not just real-time data.

I plan on including functionality for other third-party tools in the base install to enhance Opserver if you’re using them. For example, sp_WhoIsActive is already integrated, things like sp_Blitz,sp_AskBrent, and larger products like SQL Sentry will be tied in. They’ll absolutely not be required, just add views and details if they’re there...since the information they provide will then be available.

Opserver also exposes almost all the data it has via JSON in a REST-feeling way. I plan to make all data available this way so the UI is totally optional. This allows whomever to write scripts against routes returning JSON to use in other ways, it really opens up many use cases.

InfoQ asked why Stack Exchange decided to build its own monitoring tool. Nick told us that it grew organically:

It started out as a central exception log viewer from our StackExchange.Exceptional database, a central log location for all our applications. From there as a spare time project I started adding aspects of monitoring that didn’t exist, or didn’t exist correctly already (e.g.: an issue with SQL Server 2012's AlwaysOn monitoring).

From there I started adding SQL features for things we like to keep an eye on because I wanted a single place to view all our systems. After that, I started adding all the systems we use at Stack Exchange...the goal shifted from filling the gaps in existing monitoring to having a single pane of glass view of our infrastructure.

Is your profile up-to-date? Please take a moment to review and update.

Email Address

Note: If updating/changing your email, a validation request will be sent

Company name:

Keep current company name

Update Company name to:

Company role:

Keep current company role

Update company role to:

Company size:

Keep current company Size

Update company size to:

Country/Zone:

Keep current country/zone

Update country/zone to:

State/Province/Region:

Keep current state/province/region

Update state/province/region to:

Subscribe to our newsletter?

Subscribe to our industry email notices?

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.