Building a Cloud Factory

Few areas of human endeavor can match the pace of change in IT. Even by IT standards, the change being driven by cloud computing sometimes seems surprising. To refer to a virtual environment that has only recently been deployed as "legacy," as some organizations are now doing, underscores the fact that the only thing constant in the data center is change. To deal with change of this magnitude, which can involve transforming the workload hosting model of an entire organization, some industrial-strength thinking is required.

In order to tackle this challenge, it's important to properly frame the cloud transformation problem. Many associate cloud with agility, flexibility, cost transparency and other end-user-oriented benefits. But many of these attributes are primarily associated with new infrastructure requests, and specifically, the use of self-service portals to "spin up" infrastructure to host new applications or host transient processing demands. When it comes to migrating hundreds or thousands of existing workloads into cloud infrastructure, agility is not a benefit that is typically experienced. In fact the opposite is often the case: because clouds require a higher degree of standardization (i.e., a finite catalog of sizes and software options), migrating existing physical and virtual servers into cloud models can actually be quite difficult. In other words, the very features that make clouds agile for new workload deployments can actually make them less agile from a transformation perspective.

This is where the notion of a factory comes in. In industrial processes, factories are the epitome of scalability, repeatability and productivity. Although they may take some effort to "tool up," once they are up and running they can handle a higher flow of activity, efficiently processing inputs to provide consistent output. This notion is also key to large-scale transformation. By applying a common approach that has been properly engineered to give repeatable results, organizations can greatly reduce the time and effort required to migrate to cloud infrastructure.

Within this concept, it is important to expand on what is meant by "properly engineered." Many organizations tackle these kinds of problems from a grassroots perspective, using spreadsheets and smart people to determine action. The problem with this approach is it rarely evolves to the point where it can generate truly accurate answers, mainly because the problem is too complex. Migrating workloads into clouds requires processing volumes of historical data, analyzing configuration information on the servers and applications being migrated, modeling target instance sizes and software stacks, enforcing corporate and regulatory requirements, honoring SLA and data protection rules, etc. Spreadsheets are not well suited to this, in much the same way that they are ill suited for use as corporate accounting platforms. Even if they can be coaxed into giving a decent answer for simple environments, they will not generate the reports needed to satisfy stakeholders, management, engineering, operations, etc., all of whom need significant detail surrounding the decisions being made in order to ensure benefits are achieved and risk is minimized.

Buried in the list of migration analysis requirements is a key concept linking them all together. This is the notion of policy, which represents the ground rules on how workloads should be hosted, where they should and should not go, how much resources they should be allocated, etc. Without properly modeled policies, hosting decisions are left to the practitioner performing the migration, and it can be hit-or-miss whether they do the right thing (or even follow the same policy twice in a row). Planning and managing cloud infrastructure without proper policies is like trying to fill out a tax return without instructions - there are just too many variables to get it right.

With all of these concepts in mind, the exact nature of the cloud factory becomes clearer. It divides the problem into a series of logical steps that combine data, target models and cloud planning and management policies in order to automate the process of deciding exactly where things go and how big to make them. These steps that make up the factory are:

Candidate Qualification: This process determines whether a given set of workloads are suitable to be hosted in a given cloud environment. This is both qualitative and quantitative in nature and designed to separate true candidates from the workloads that are better suited to go elsewhere (more on this later in step 6). Examples of quantitative criteria include maximum I/O rates, context switching limitations, maximum CPU and memory sizes, etc. Qualitative criteria include data sensitivity, SLA requirements, backup strategy and other considerations. By applying a policy capturing all of these factors, a rapid and accurate assessment can be made.

Sizing: This takes the qualified candidates and determines what cloud instances are best suited to host them given their historical levels and patterns of utilization. This again is subject to policy, which governs how much history is considered, target utilization levels, etc. The result is a detailed specification of the instance sizes needed and the projected utilization levels in the "to be" environment. Note the use of benchmarks is critical in this step, as the translation of CPU utilization from the current environment to the cloud depends on the relative speeds of the CPU employed in each.

Load Balancing: Also a sizing step, this is focused on the load balancers and clusters being migrated. Because cloud environments offer different sizing options, and can even offer more advanced "elasticity" features, it is not always desirable to do a straight one-to-one translation of these servers into cloud capacity. For example, an 8-way IIS cluster might translate onto 12 smalls, 6 mediums and 3 large instances. Of these options, the one that meets the policy criteria (e.g., size for yearly peak activity, allow for N+1 resiliency) at the lowest cost will be the winner. This result is combined with the general sizing results from the previous step to provide a complete sizing plan.

Software Stack Mapping: This step considers the OS and software configurations of the source servers and maps them onto the "closest" configuration available in the cloud. Because cloud catalogs only offer a finite set of software options, this is effectively a standardization analysis. For Infrastructure-as-a-Service (IaaS), this step is typically limited to the OS-level configuration and matches the OS attributes of the existing servers and VMs to the operating systems that are on offer in the cloud (which is typically a much shorter list). For Platform-as-a-Service this step also includes scrutiny of the actual software inventory and applications installed. The result may say "server X looks the most like an IIS v6 server, but differs from the standard image in the following ways..." This not only provides the optimal stack to deploy, but also generates a remediation list that is critical for reducing risk during implementation.

Placement: Once the final specification is arrived at (through sizing, balancing and software mapping), the next step for internal cloud environments is determining exactly where the workloads should be placed in the infrastructure actually hosting the cloud environment. Because most clouds are based on virtual environments, the key is to fit the new VMs into the environment in a way that optimally leverages server resources. This step looks somewhat similar to placement of workloads in virtual environments (which tends to resemble placing Tetris blocks in available server capacity), but the policy regarding overcommit has a large influence on the resulting placements. If the policy is to strictly reserve the capacity for each cloud instance, then the environment will be very safe but relatively inefficient, as the workload density will be quite low (think of playing Tetris with the blocks wrapped in bubbles). If the policy is to fully overcommit resources, then the end customer may have a higher risk of contention if they place unanticipated demands on the environment, but the higher density that results can result in significantly lower costs (think Tetris blocks packed tightly together, requiring far less capacity).

Exception Handling: Going back to step 1, there are typically components of an application or business service that may not be suitable for hosting in the cloud. For these systems, it is necessary to evaluate other hosting options in order to determine what to do with them. Because there is often an order of precedence with respect to the hosting options, this step involves the systematic qualification of the rejected workloads against an ordered set of hosting strategies. These strategies can include using cloud instances with customized allocations, using dedicated cloud servers, hosting in a virtual environment, using dedicated blades, using dedicated rack mount servers or leaving the workloads alone (a last resort). By passing the rejected candidates through this gauntlet of options, each will arrive at a viable outcome.

The result of applying these steps is a methodical, exhaustive and rapid process for planning cloud migrations. By taking a data-centric, policy-driven approach, fewer mistakes are made, less rework is required, and application owners and other stakeholders will have much higher confidence they will arrive on the other end unscathed. This transparency, combined with the detailed specifications and implementation details that emerge, can rapidly accelerate cloud initiatives. This not only reduces time-to-value, but also enables IT organizations to keep up with the pace of technology innovation, which shows no sign of letting up.

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.