Battle scars in the Cloud

History is filled with examples of abuse and misuse of abundant resources. This is true with IT infrastructures that rely even partially on the cloud. While the seemingly endless supply of computing resources on demand creates a mirage of cloud perfection, there are innumerable examples where an ill-disciplined approach to the cloud can create a lot of problems.

I have seen many customers that have taken to the cloud with ample enthusiasm – yet the outcomes have not always been positive. The ability to scale resources as needed, avoid tiring and tedious change control processes, eliminate delays in standing up a new environment and other benefits presents the perfect double-edged sword.

Midmarket and enterprise companies too often lack the processes and governance to avoid cloud sprawl, in the form of unneeded or underutilised resources. Not only does this result in sky-high bills from the provider, but it also creates security risks from connections and resources that are still up and running but not used or monitored. There are also a myriad of preventable deployment issues; it’s easy to minimise the complexity of growing the cloud environment 10X overnight until a few weeks later when performance problems begin to rear their ugly heads.

Adding to the confusion is accountability. With multiple cloud purchasers across your organisation – inside of IT and within lines of business – it’s hard to know what’s being provisioned where, at any given moment. Working with more than one cloud provider makes things even more complicated and hard to determine whom to contact when there’s a problem. This free-for-all infrastructure environment is a recipe for disaster. First, let’s break down how problems occur, and then, we’ll discuss an approach to making all of this easier, more manageable and more cost-effective.

Sticker shock

In the olden days of on-premise IT, companies regularly followed formal processes for purchasing, configuring, documenting and deploying networking equipment, servers, storage and security systems. With the cloud, it’s been more like the Wild Wild West. It’s so easy, quick and relatively cheap to spin up a server in the cloud, that anyone can do it – and in some companies, hundreds of people are doing it.

This delivers an illusion of business agility, yet the price can be high. IT quickly loses all control of what is being provisioned, where, how, and for what purpose: there’s no cohesive plan and a lot of waste. At one customer, monthly variations in cloud bills were in excess of $100,000. Here are a few reasons why cloud spending goes haywire:

Cloud resources are purchased in an adhoc manner and provisioning and procurement aren’t managed centrally. This brings an uncertainty to total cloud costs.

Over-subscription of services compared to actual need.

Inattention to ramping down services when no longer needed.

Difficulty migrating data from high-performing resources, such as storage, to low-performing resources that are cheaper. Cloud providers do not provide simple tools for managing the process, and developers and IT managers may not know when to move data either.

Production issues

Engineering heads and development leads have a different set of issues operating the cloud which often leads to delays in production:

Developers use the cloud to quickly create a test/dev environment, even if the application ultimately resides in-house. Fundamental differences in underlying technology, management and supportability between on-premise technologies and cloud infrastructure can slow down production. For example, a customer developing in the cloud can access a database as a service, whereas in-house they need to plan for database capacity, licenses, upgrade cycles and availability.

If the ultimate hosting environment for the new app or service is in the cloud, some of these issues are mitigated. However, organisations still need to address management issues and integration with their in-house APM (application performance management) and other systems management tools.

A create and kill approach can lead to IT not truly understanding root causes for performance issues and failures and the advent of cloud sprawl. It’s a quick fix to spin up another server when CPU starts to reach the threshold, but that’s not always the best response. The problem might be more readily (and cheaply) solved through code optimisation.

Cloud technology is constantly evolving. Developers tend to grab the latest tools without thinking through how those tools integrate with existing IT management and operations platforms. Tool sprawl along with a lack of skills and support for those tools hinders faster adoption.

Insecure for no reason

Last but definitely not least, the overall process by which resources are created, configured reconfigured, and abandoned, leaving many cloud orphans behind is a huge security concern for security officers as well as CIOs. Too often, IT managers setting up systems in the cloud do not follow corporate standards for configuration, patching, monitoring and upgrade cycles, leaving vulnerabilities. One of our customers using a very popular cloud provider was hacked because of a system that was created with an open port.

Now what?

A large proportion of our customers regularly experience some or all of the above issues. Avoiding these issues requires a comprehensive cloud operations and usage program. Such a program should include the following processes:

Track cloud subscriptions and monitoring cloud service pricing to maximise cost efficiencies in the cloud;

Schedule regular security training awareness and best practices sessions with development teams;

Perform frequent and on-demand random security scans.

The cloud is no longer an experiment but a permanent extension to the current IT infrastructure. Executives who accept this fact early on and put processes in place for effective governance, will have much less to worry about in terms of, cost, time to market and reaching positive business outcomes.