Why Did It Happen:
A bug in the system used to predict VM demand caused the management system responsible for pre-booting VMs to stop requesting new resources, which in turn caused a drop in available capacity.

What did we do to fix it:
We manually set the VM demand values and allowed our cloud to catch up.

What are we doing to prevent it from happening again:
We've corrected the initial bug in our prediction service, as well as hardened the management system so that it is both easier to debug and reacts more quickly to bad inputs.

Posted 6 days ago. Dec 07, 2017 - 10:25 PST

Resolved

Wait times have returned to normal levels. All services are fully operational

Posted 13 days ago. Nov 30, 2017 - 08:24 PST

Monitoring

We've deployed a fix which should handle the wait times of our VMs and we're continuing to monitor the fix

Posted 13 days ago. Nov 30, 2017 - 08:00 PST

Investigating

We are seeing long wait times for automated and manual testing and are taking immediate actions to rectify