Troubleshooting Cloud Sandboxes with Jira

Posted by Pascal Joly August 16, 2017

In the devOps grand scheme of things, troubleshooting and support automation often get the short end of the stick, giving up the limelight to their more glorious pipeline, deployment and build cousins. However, when we consider the real world implication to implement these processes and "automate everything", this space deserves some scrutiny.

In this integration we address a painful problem that happens all too often in the lab: a user who needs to complete a test or certification reserves an environment, but one device or equipment fails to respond. Unlike most data center production environments, there is rarely a streamlined process to address lab issues: the user calls the IT administrator about the problem, then gets an uncommitted time if at all when the problem will be fixed, and in some cases never hears back again. It might take escalation and lots of delays to eventually get things done.
When operating at scale on highly sensitive projects or PoCs, organizations expect a streamlined process to address these issues. Support of mission critical testing infrastructure should be aligned to SLAs and downtime should be kept to a minimum.

So what does it take to make it happen?

Getting rid of the Friction points

The intent of the integration plugin between Quali's CloudShell Sandbox platform and Atlassian Jira's industry leading issue tracking system is quite simple: eliminate all the friction points that would slow down a device or application certification cycle in the event of a failure. It provides an effective and optimal way to manage and troubleshoot device failures in Sandboxes with built in automation for both end user and the support engineer.

The process goes as follow:
Phase 1: A user reserves a blueprint in CloudShell, and the sandbox setup orchestration detects a faulty device (health check function).
This in turn generates a warning message for the user to terminate the sandbox due to a failed equipment. The user is also prompted to relaunch a new sandbox, since the abstracted component in the blueprint will now pick a new device, which hopefully will pass the healthcheck test.
The device at fault is then retired out of the pool of available end user equipment a put into a quarantine usage domain. In the process a ticket is opened in Jira with the device information, and the description and context of the detected failure.

Phase 2: Once a Support Engineer is ready to work on the equipment, they can just open the Jira ticket and from there, directly create a sandbox with the faulty device. That provides them console access through CloudShell and other automation functions if needed to perform standard root cause analysis and attempt to solve the problem. Once they close the ticket, the device is automatically returned to the user domain pools for consumption in sandboxes.

To sum it all up, combining the power of CloudShell Sandbox orchestration and Jira help desk platform, this simple end to end process provides a predictable way to save time and improve productivity for the end user by removing the friction points and automating key transitions to streamline the process for the support engineer.