Understanding public cloud governance best practices

IT governance in general is complex. Cloud governance is even more so because the whole point of cloud is to give up some level of control to developers. That means instead of a small handful of trusted admins performing every action, a wide range of individuals in many different roles may have self service capabilities.

I’ve seen a lot of models for maintaining governance over IT environments, and they all share the common denominator of being complex and difficult to really understand. I like a nice, simple view:

Governance is meant to ensure things are cheap, easy to manage, and secure.

Management (or operations management) is making sure things are working the way you want, while governance is making sure that things are easy to manage.

Security management is making sure you keep private what should be private, while governance is making sure that it is an attainable goal.

Cost management is controlling and reducing spend, while governance is making sure that people follow rules that make cost management achievable.

I’ve never seen it expressed quite that way, but I hope you find that way of looking at it useful. Note how each splits management — which I am defining as the actions a team takes daily to make things work the desired way — from governance, where a set of rules is defined and enforced to make management possible and efficient.

I will split this discussion into three parts: make it easy, make it cheap, and make it secure. These three are highly related. For example, overly complex environments (failure to make it easy) are difficult to secure and hard to find ways to reduce spend. This insight especially applies to multi-cloud environments, where an enterprise might have two major public clouds, with multiple PaaS services, in addition to a couple of virtualization platforms, disparate automation / private cloud stacks, etc. Finding the right mix of choice and control is hard. Clearly defining your governance goals makes it a lot easier.

First, to make managing an environment easy we need to reduce complexity. Some of the simple examples of where complexity creeps in are OS options, network configurations, and the authentication mechanisms allowed.

For OS options it is best to limit the choices to a handful of approved standards across the business and make it difficult to seek exemptions. Then take that option and bake a single standard way of deploying that OS. We get into a rabbit hole here quickly, with pre-baked images vs. deploying components at provisioning time, but regardless of how you approach this problem you want to define a single standard and stick with it.

Next is consideration of network configurations. In a previous article I talked about multi-account strategies for Amazon Web Services and part of that strategy relates to your network topology. The biggest single thing you can do for network governance is to have an IP Address Management (IPAM) tool that can be used across your cloud environments. This stores information about what networks are in which environments and used for what purpose. This information becomes invaluable for managing the environments.

Finally for authentication, one of the biggest issues with managing multiple clouds and different environments is managing credentials. Having a governance standard that defines a single identity provider per class of user (i.e., one for employees, one for customers, one for IoT devices, etc.) makes management a lot easier.

These are but some examples of reducing complexity by defining standards as part of your governance regime. The next step is ensuring compliance. This can be achieved through the use of either cloud native tools like Amazon Web Services Config or a third party SIEM.

Keeping costs down

The next high-level governance goal I’d like to discuss is making it cheap. This is a very rich area for discussion that could (and probably will) have its own blog post. But some basic guidelines are essential from day one.

One example includes limiting the types (sizes) of instances (or servers) that can be created by end users. Both AWS and Azure have a lot of options, but due to the way that cost controls work in both clouds, limiting users to a handful (4 or 5) options makes cost management drastically more effective.

Another place you can cut costs is by defining and enforcing what a dev environment actually is. Do you need a high availability database for dev? Probably not, same with high performance clusters that are better suited to pre-prod/QA where load test occurs. Maybe everyone insists they need Oracle enterprise edition, but they probably don’t, and could get by just fine with AWS Aurora, which would save money.

This plays into the same themes as the previous section about making it simple. We want to reduce choice intelligently, without stifling innovation.

One thing that is very important is tagging all your resources to make cost allocation easier (this is very much a governance goal!). Another thing that will help is creating a standard for how to express the business value of a project. Make that model so it includes things like opportunity cost if another platform was used which would cost more development time, and the value of specific features of common platforms to the specific project and how they influence the business value of the project. Yes, it can be difficult, but it lets you make informed decisions about what to allow, where and when to allow it, and why, based on what’s best for the business.

Lock it down

Finally, we get to the third high level governance goal, make it secure. This is where I’d like to draw on the previous two sections to establish some important ideas.

Governance always prioritizes simplicity, visibility, and compliance. These are especially important to securing multiple environments. We want to take a two-pronged approach. We want to intelligently limit options at the front end so that obeying the rules is the easy option. And you also want to have visibility into who is doing what so we can ensure people are making compliant decisions.

If we go back through nearly every point in the previous two paragraphs it is possible to draw a corollary to how the governance goal improves our ability to secure the environment. A simple example includes how limiting our OS standards makes it easier to patch, harden, and monitor activity on those operating systems.

Standardizing identity providers and having a standard view of network information allows you to more easily understand the who and where of actions being taken. Reducing the deployment options in dev for cost control purposes also reduces the types of acceptable events you expect to see in your security incident and event management (SIEM) solution, which drastically simplifies your security engineers work.

By making the environment simpler and easier to understand for an engineer or architect, the entire security discussion becomes far easier to have.

While there are a ton of specific examples I could hit at this point, I think it’s best to close here. By thinking of governance as the act of defining and enforcing policies and standards that make management easier, you can focus at the front end on creating the right balance between choice and simplicity.

One word of warning. Don’t let perfect be the enemy of good. There will be valid exceptions to the rule, and defining how to determine when an exception is warranted is an important piece of your governance strategy.

I hope you found this useful, and I wish you the best of luck in making your cloud experience easier, cheaper, and more secure!

Eric Moore is Chief Technologist for the AWS Integrated Practice at DXC Technology. He is experienced in cloud computing, architecture and automation with a background in operations, business consulting, and security.