“At approximately 7:30 am Pacific time this morning, Google began experiencing slow performance and dropped connections from one of the components of App Engine. The symptoms that service users would experience include slow response and an inability to connect to services. We currently show that a majority of App Engine users and services are affected. Google engineering teams are investigating a number of options for restoring service as quickly as possible, and we will provide another update as information changes, or within 60 minutes.””

Google’s Dashboard shows the App Engine service was still having problems as I wrote this, about 10:55 Pacific.

During the recent Amazon Web Services (AWS) outage, real-time API monitoring could have warned AWS customers like Reddit, Heroku, and Foursquare earlier so they could work to mitigate the problem or work with Amazon to resolve it. Moreover, real-time API monitoring can help to strengthen service-level agreements (SLAs) for cloud computing, providing enterprise customers the transparency they need before they can trust the cloud with their mission-critical applications.

Conducting real-time API monitoring should be the domain of application performance management (APM) technologies, but traditional APM tools are simply not up to the task in today’s increasingly complex and distributed environments. API-dependent businesses need a new approach to API monitoring…

Weigh in in the comments section below: Are cloud outages just par for the course moving forward, or are the cloud players going to be able to get this under better control in the future with better monitoring etc?