When Clouds Behave Badly

In an earlier blog , we looked at an interesting fact: a year ago about 50%, and now 80% of the software applications we use at Riverbed run in the cloud, and this percentage will continue to rise. In reality, cloud or SaaS-based applications like Microsoft Office 365, Salesforce.com and WebEx have become the norm for modern companies, and the day is not far off when most businesses will run 100% of their applications in the cloud. In fact, my colleague Ben Haines, CIO of Box, is already operating at 100% SaaS.

Our growing dependence on cloud providers raises some serious questions and brings to bear a new set of challenges that didn’t exist just a few years ago when companies ran almost all their applications ‘on premise.’ For example, what happens if one of your cloud providers goes out of business? Does it take your data and server capacity with it?

Other issues invariably crop up when you run a SaaS-intensive environment. For example, how many times have your users updated a browser (or OS) only to discover that your favorite SaaS app won’t work with the new versions? And having dozens of SaaS providers means you have dozens of individual business relationships to manage—and dozens of different (and sometimes incompatible) architectural elements that you need to manage and troubleshoot. Architectural Collisions

In these non-centralized, multi-SaaS environments, problems with your ‘clouds’ can pop up unexpectedly because of what I call ‘architectural collisions.’ Issues can range from compatibility conflicts between your cloud service and existing systems, to data integration problems, or simply poor performance. Chances are these are issues that you had no way of anticipating when you made the initial decision to bring on a particular SaaS provider.

More often than not, problems like this show up at the user experience level, when a partner or business user experiences slowdowns or spotty connectivity. In less severe cases, workers waste time, productivity suffers and opportunities are lost. In the worst cases, architectural collisions can cripple a business—like when your systems are so slow that you can’t complete a customer transaction or book revenue internally or externally through a partner.

Typically the burden falls on IT organizations, like my own, to sort out all these issues. We spend time tracking down root causes and isolating problems. We also spend time cleaning, migrating and integrating data across multiple cloud and on-premise systems. That translates to real operational costs through resource loads and support calls.

Bad clouds/dealing with bad behavior

So what do you do ‘when clouds behave badly?’ Part of the answer lies in the proper vetting of SaaS apps when you decide to bring a new cloud service into your ecosystem. Increasingly companies will need to create a ‘decision matrix’ to help IT and business jointly decide which new SaaS applications can be brought into the company, and which are not compatible for user experience ‘mash ups.’

Another solution is for companies to start demanding greater visibility into their cloud service providers’ product and upgrade roadmaps. Too often cloud vendors push out collision-producing updates without consulting their customer base. That situation should change when more customers demand a seat at the table before SaaS providers decide to release new products and updates in their products and roadmaps.

Performance vs. availability

Availability is the legacy metric…tablestakes. Performance is the new differentiator metric…compelling service guarantees. Ultimately, guaranteeing high performance should be part of every SaaS service level agreement. These SLAs should come with a clear statement about how you can achieve parity when your cloud providers fall short of specific performance targets. But how do you measure performance and who’s really responsible for any given slowdown or disconnect? It’s a world of many clouds and many interconnections.

For example, one SaaS provider told me about the company’s constant battles with customers over the cause of sluggish online performance. Whose network was to blame—the customer’s or the SaaS provider’s? Or was it somebody in-between, like the telecom carrier? Network architects on all sides worked overtime tracking down the ‘culprit.’ Not really the most efficient way to solve the problem! Murphy’s Law is always opportunistic and these things seem to occur at critical periods like Quarter End. IT lives Murphy’s Law.

Mean time to innocence

Missing in this situation were reliable technologies and methods for objectively tracking down and isolating the root cause of service bottlenecks and outages. To put it simply, visibility was lacking. This leads us to Riverbed’s amazing products and successful deployments, where exactly these kinds of problems have been solved…for ourselves as well. In all kinds of SaaS scenarios, Riverbed can narrow down the problem and speed issue resolution—see Riverbed products.

In fact, Riverbed products become an invaluable component of service level agreements, providing the visibility necessary for contracting parties to rapidly determine who’s responsible. In other words, the ‘mean time to innocence’ is reduced drastically—and that leads to a faster ‘mean time to resolution.’ This saves us all money and cloud headaches.

Here again, Riverbed’s products are ideally positioned to play a leading role in a burgeoning market—in this case, the proliferation of cloud applications in businesses worldwide. The fact that our products could also help Riverbed manage its own mix of cloud services only adds to our success story.