Hardware malfunction: monitoring data unaffected

Shortly after 9am CET today, Friday December 7, 2012, some of our systems started to experience downtime. This included our pingdom.com homepage and the customer control panel at my.pingdom.com. All systems have now been restored and are operating at normal capacity and functionality.

Monitoring of our customers’ websites was not affected, but alerts, unfortunately, were delayed. We are as passionate about uptime as our customers are, and this sort of thing should not happen. Since this is a rare occurrence for us, we want to take this opportunity to explain what happened, at least insofar as we know right now.

We received internal alerts as soon as things started to go wrong earlier today, and proceeded to investigate. It became apparent that this incident was caused by faulty hardware and here are some of the actions we’ve already taken so far:

We have secured that monitoring, downtime alerts, public reports and all our other systems are now operating normally.

We have secured that monitoring data from all our probes and for all our customers has been collected as normal throughout the day. This data is secure, but there is a considerable amount of it. We’re currently processing it so that it can be rolled out to customers’ accounts. There may still be delays in sending out alerts and in data appearing in the my.pingdom.com control panel.

We kept a dialogue going with our hardware and software providers during the entire incident, which helped us to quickly address the underlying issues, and move key hardware components to new systems.

This is obviously not a complete postmortem. That will have to wait until we’ve had an opportunity to perform a detailed investigation. When that is complete, we may return with an updated post.

We want to assure all our customers that although mistakes happen, they are not acceptable because we take considerable pride in delivering a service that you can depend upon.