There are basically three main approaches to managing applications in a cloud environment: Orchestration, PaaS (Platform as a Service) and CMP (Cloud Management Platform). The end goal of all three is fairly similar - have a fully managed application on the cloud. It is therefore not surprising that users have a hard time choosing which of the approaches best suits their needs.

In this post, I wanted to make a simple distinction between these three options and offer some guidance on when you should consider each of the approaches.

We will use the following categories as the basis for the comparison of the three different cloud management approaches:

Application Workloads

In this category we compare the three approaches based on the kind of application workload that best fits each platform, for example: web applications, stateful applications, big-data, legacy, etc.

Support for Advanced IaaS Services

In this category we compare the three approaches based on the depth of IaaS support, for example: how much each platform can utilize advances IaaS services such as DBaaS, LBaaS, EMR, etc.

Support for DevOps Processes

DevOps processes such as continuous deployment tend to be application specific as they touch not just the application itself, but also other services that are external to the application such as - support system, build-system etc. A typical DevOps process involves creating new environments for QA, Production, etc. or updating existing deployments. There are different techniques such as Canary, Blue/Green, or Immutable that are often used to handle those processes. In this category we compare the three approaches based on the degree of support for those DevOps patterns.

Support for Containers and Cloud Native Stack

Containers can be viewed as lightweight VMs or application packages. The use of containers removes a large part of the application configuration management and packaging complexity. In this category, we compare the three approaches based on the degree of support for that container technology e.g. Docker, Swarm, Kubernetes, Mesos, Fleet.

Support for Bare Metal

Bare Metal machines are often used to allow maximum performance and utilization of the HW resources by bypassing the virtualization layer overhead. Bare metal is becoming a popular target for application deployment with the introduction of new bare metal cloud as well as containers that provide a cost effective way to use bare metal resources. In this category we compare the three approaches based on their support for a bare metal environment.

Support for Network Orchestration

Many enterprises and clouds support virtual networking through SDN. In this category we compare the three approaches based on the degree of support for network orchestration.

Target Users

This category will compare the different approach fits well with Developers in a DevOps role as well as Operations.

Let’s take a deep dive into the different cloud management options.

Orchestration Defined

Application Workloads:The automation approach is fairly un-opinionated and therefore can address any use case that can be mapped into distinct manual steps. Because of this, it can be applied to a large variety of workloads starting from simple web applications to Big Data and analytics, or even legacy applications.

Support for Advanced IaaS Services:Orchestration/automation doesn’t limit you from the use of any IaaS services as it doesn’t impose a layer of abstraction to manage a hybrid cloud environment.

Support for DevOps Processes:Orchestration models the application into blueprints which makes it easy to create new instances of the same blueprint for Dev/Test or Production in a consistent way (a blueprint represents an environment meta-model). Orchestration also provides a way to interact with the existing deployments through workflows. Workflows are used to implement the DevOps processes. Since workflows are basically an execution script, they can easily interact with any external system such as build system or support system as part of the continuous deploymentprocess.

Support for Containers and Cloud Native Stack:Most of the container based platforms (Docker, CoreOS) come with built-in orchestration. There’s also other types of container-centric orchestration such as Kubernetes which provides a more complete solution for managing applications.

Other “pure play” orchestration tools such as Cloudify and Terraform provide support for setting up a container-based infrastructure and also integrate a hybrid stack of multiple containers technologies such as Kubernetes and Mesos, alongside non-container stack technologies such as databases, Big Data and legacy apps.

Support for Bare Metal:Orchestrators are basically automation tools and can orchestrate applications on a cloud-based environment or bare metal. Some of the orchestration uses a resource pool mechanism to allow dynamic allocation of the application resources on a pool of bare metal machines.

Support for Network Orchestration:Network orchestration is considered a core piece in NFV. There are two modeling languages that are commonly used for network orchestration: TOSCA, a standard for application topology definition, and YANG, a standard modeling language for configuration of network devices. The two modeling languages can also be combined todeploy network-centric services.

Target Users:The automation approach fits well for Developers in a DevOps role as well as Operations engineers.

PaaS Defined

Definition

PaaS takes a more a developer-centric approach. It was built as an abstraction layer that is aimed to help the developers focus on writing code by abstracting all the infrastructure and operational aspects. To achieve this goal, most of the PaaS platforms come with a fairly opinionated approach on how applications should be running on those platforms (Cloud Foundry refers to this as the 12 factors).

Application Workloads:

PaaS products are mostly suited for greenfield applications that fit the 12 factors definition. They are also suited mostly to web application and not so much to Big Data or legacy applications.

Support for Advance IaaS Services:

The abstraction approach provided by most PaaS products aims to “hide” the complexity of the infrastructure from the developers through a relatively “thick” layer of abstraction. One of the side effects that comes with that abstraction approach is it limits the use of the infrastructure to a basic compute and storage pool of resources where in reality many of the modern cloud infrastructures today provide much more advanced services such as DBaaS, LBaaS, EMR etc. This also causes lots of duplication of logic for handling things like user management, billing, and quota, that are already provided by the infrastructure.

In addition, a PaaS often comes with its own set of logging, monitoring and development tools. DevOps users on the other hand tend to build their own tool chain, and the set of tools they use tend to vary between users, and over time, quite rapidly.

Support for DevOps Processes:

Most PaaS products provide built-in implementations of some of the continuous deployment processes, e.g. Canary, Blue/Green. However, those implementations tend to be tied to the platform with only a fairly limited degree of customization as the user is not expected to have access to the internals of the platform such as the load-balancer.

Support for Containers and Cloud Native Stack:

Both OpenShift and Cloud Foundry are built on top of containers. OpenShift, specifically, is built on top of Kubernetes. Having said that, PaaS doesn’t provide direct control over the underlying container orchestration and therefore limits the use of container orchestration within the PaaS platform.

On top of that, PaaS rely on a specific container technology, therefore users don’t have the flexibility to use their container orchestration of choice such as Core-OS Fleet, Docker Swarm, Mesos.

Support for Bare Metal:

Most of the PaaS solutions were designed to run on top of a virtualized cloud-based environment and were not built to run on bare metal.

Support for Network Orchestration:

PaaS doesn’t expose most of the aspects of the network configuration to the end user and therefore comes with their own, opinionated, network configuration architecture.

CMP Defined

Definition

CMP stands for Cloud Management Platform. CMPs take an infrastructure-centric approach where the main focus is monitoring and managing of infrastructure resources such as virtual machines, storage, network etc. It can be used indirectly to manage applications by combining some orchestration capabilities as part of the platform.

Application Workloads:

Similar to automation, CMPs come less opinionated and can therefore address a wide range of applications. Having said that, many CMPs were designed to provide infrastructure management and control the virtual machine that hosts the application, but not the application itself. CMPs don’t come with strong application management and automation capabilities to handle certain aspects such as dependency management, discovery, configuration management, application management, etc.

Automating the full lifecycle of a given application, including configuration management and automation of post deployment aspects such as fail-over and scaling, would require more integration work with third party tools in order to fill this gap.

Support for Advanced IaaS Services:

The main value of Multi-Cloud CMPs is that they often provide a “single pane of glass” for controlling and monitoring cloud infrastructure across different cloud providers. To achieve this goal, many of the CMPs have to force some degree of “least common denominator” abstraction where they view the cloud infrastructure as a simple pool of compute and storage resources. By doing so, they limit the use of more advanced services provided by most of the modern cloud infrastructures today, as mentioned above.

Support for DevOps Processes:

Most of the CMP products focus on managing infrastructure resources. Continuous deployment processes tend to be application-centric by definition. Handling DevOps processes through a CMP is often non-trivial and requires integration with third party tools to handle this task, but quite often, the level of interaction needed between the two systems requires tight integration.

In addition to that CMPs often come with a fairly monolithic architecture that includes their own Monitoring, Logging, Billing, etc and therefore doesn’t fit well in a DevOps environment. DevOps users tend to build their own tool chain and the set of tools that they would use tend to vary between users, and over time, quite rapidly.

Support for Containers and Cloud Native Stack:

Most CMP can run containers on top of VMs.

Support for Bare Metal:

Most of the CMP solutions were designed to manage virtualized, cloud-based environments and were not built to manage bare metal resources.

Support for Network Orchestration:

Most of the multi-cloud CMP solutions provide a limited set of network configuration mostly related to configuration of security groups and the load balancer. Advanced networking configuration, such as creating private network per apps, micro-segmentation, routers, firewall, and WAN gateways, are often not supported.

Application Delivery

The purpose of this comparison was to demonstrate the differences between the three approaches for managing applications on a cloud-based environment. All of these approaches are a means to an end, and one of the most common use cases for these management frameworks is to allow faster delivery of applications.

To allow faster delivery of applications we need to provide the developers with the right tools for the job. Every developer may have different needs and, therefore, tools needed to serve his application as I pointed out in one of my previous posts entitled “What Developers Want”.

Based on the above analysis, we can measure each of the approaches under the following criteria:

The Key Benchmarks

How fast you can serve the tools that developers need to become productive?

The open-source world offers plenty of new frameworks and tools which are being announced almost on a daily basis. Developers want to have access to them all, as, in many cases, those tools will allow them to gain better productivity and thus speed up the development process. What developers care less about is the process of managing and configuring those tools.

Choosing a specific tool, such as a specific PaaS, is not a strategy.

Don’t build a strategy based on a specific tool, as by the time you are ready to support it, a new platform will emerge. Instead, be ready to support multiple platforms and tools even if they overlap to allow maximum flexibility for developers to then choose the right tool for the job rather that forcing them to fit into a specific platform stack.

As we can see in the following OpenStack surveys from 2015 and 2016, the popularity of the various platforms tends to change rapidly. In this specific case, we can see that Kubernetes grew almost 30% at the expense of Cloud Foundry. Cloudify grew 20% in use with OpenStack in production, according to users.

So, what we can understand from this analysis is that in order to serve those developers well the challenges are:

How fast you can introduce a new framework into your cloud?

How capable are you at managing many different types of frameworks?

It becomes clear now that in order to achieve this goal we need to have a generic way that will allow us to take any application and framework and offer it as a managed service, as shown in the diagram below:

Conclusion

In my personal opinion, the ability to introduce new frameworks fast is directly related to the flexibility of the management platform. Orchestration frameworks are more targeted to this approach because they are built to automate manual processes - in this case, installation, configuration, etc – directly, not indirectly as in PaaS or CMP options.

Having said that, the three options may not always be mutually exclusive, and it is also common to see a combination of some of the tools. For example, a generic Orchestration tool can be used to configure and setup a PaaS like Cloud Foundry or Kubernetes. A CMP can have an Orchestration framework as an add-on service that runs through the CMP thereby providing both the application and infrastructure view combined.

July 25, 2016

In my previous post, I discussed the differences between hybrid cloud and cloud portability, as well as how to achieve true hybrid cloud deployments without compromising on infrastructure API abstraction, by providing several use cases for cloud portability.

Cloud Portability Defined (again)

For the sake of clarity, I thought it would be a good idea to include my definition of cloud portability again here: “Cloud portability is the ability to run the same application on multiple cloud infrastructures, private or public. This is basically what makes hybrid cloud possible.”

Clearly, the common infrastructure API abstraction approach forces too many restrictions on the user which makes it fairly useless for many of the cloud portability use cases.

In this post, I would like to propose another method for making cloud portability, and therefore true hybrid cloud, a reality.

An Alternative Approach

One of the use cases I previously mentioned for allowing application deployment portability to an environment, that doesn’t conform to the same set of features and APIs, is iOS and Android. With operating systems, we see that software providers are able to successfully solve the portability aspect without forcing a common abstraction.

What can we learn about cloud portability from the iOS/Android use case?

Treat portability differently between the application consumer and the application owner - One of the main observation from the iOS/Android case is that, while the consumer is often completely abstracted from the differences between the two platforms, the application developer is not abstracted and often needs to treat each platform differently and sometimes even duplicate certain aspects of the application’s components and logic to suit the underlying environment. The application owner, therefore, has the incentive to support and even invest in portability as this increases the application’s overall market reach.

Minimizing the differences, not eliminating them - While the application owner has more incentive to support each platform natively, it is important to use cloud portability as a framework that will allow for minimizing but not eliminating the differences to allow simpler development and maintenance.

The main lesson from this use case is that, to achieve a similar degree of cloud portability, we need to make a distinction between the application consumer and the application owner. For cloud portability, in order to ensure a native experience for the application consumer, we need to assume that the application owner will be required to duplicate their integration effort per target cloud.

This is the same approach we should take with cloud application portability!

In this section, I will refer to this specific project as a means by which to illustrate the principles that I mentioned above in more concrete terms.

Project ARIA is a new Apache-licensed project that provide simple, zero footprint multi-cloud orchestration based on TOSCA. It was built originally as the core orchestration for Cloudify and is now an independent project.

The diagram below provides an inside look at the ARIA architecture.

There are three pillars, upon which ARIA is built, that are needed to manage the entire stack and lifecycle of an application:

1) An infrastructure-neutral, easily extensible templating language

2) Cloud plugins

3) Workflows

TOSCA Templating Language vs. API Abstraction

ARIA utilizes the TOSCA templating language in its application blueprints which provides a means for deploying and orchestrating a single application on multiple infrastructures through individual plugins, thereby circumventing the need for a single abstraction layer.

Templating languages, such as TOSCA, provide far greater flexibility for abstraction than API abstraction as it allows easy extensibility and customization without the need to develop or change the underlying implementation code. This is done by mapping the underlying cloud API into types and allowing the user to define the way it accesses and uses those types through scripts.

With Cloudify, we chose to use TOSCA as the templating language because of its inherent infrastructure-neutral design as well as being designed as a DSL which has lots of the characteristics of a language that utilizes the support of inheritance, interfaces and a strong typing system.

Cloud Plugins

Built-in plugins for a wide range of cloud services provide out of the box integration points with the most common of these services, but unlike the least common denominator approach (i.e. a single API abstraction layer), they can be easily extended to support any cloud service.

Workflows

Workflows enable interaction with the deployment graph and provide another way to abstract common cloud operational tasks such as upgrades, snapshots, scaling, etc.

Putting It All Together

By combining the three aforementioned elements, the user is given a set of building blocks for managing the entire application stack and its lifecycle. It also provides a richer degree of flexibility that allows users to define their own degree of abstraction per use case or application.

In this manner, cloud portability is achievable without the need to change your underlying code, and, in doing so, you enable true hybrid cloud.

See the hybrid cloud orchestration demo below

July 10, 2016

Survey says, that hybrid cloud, what once was perceived as virtually mission impossible, is becoming pretty much mainstream. According tothis survey, andthis survey, some users are currently running on as many as6 clouds simultaneously on average per organization, with an even split between private and public clouds, both for real deployments as well as for experimentation, and 74% of enterprises are currently leveraging two or more cloud infrastructure vendors; making the need for robust cloud portability mission critical, not just nice to have.

From my experience though, when people refer to hybrid cloud, they often times don’t necessarily mean the same thing. With the diversity of use cases for hybrid cloud, where each one of them drives a different strategy or approach, it is difficult to lump hybrid cloud into one simplistic category.

On top of this, most of the discussion, and even the tooling built for this purpose, is focused on bridging specific aspects of the infrastructure, e.g. compute and networking as an example, and quite often caters to a wider market, forcing a “least common denominator” approach which inherently limits the use of the underlying cloud (I will get into this more below).

In addition, many of the existing tools are missing the key part in such deployment models; the actual application itself. Running an application in a hybrid cloud environment requires the handling of the entire application stack in which the infrastructure is really only one component. This includes the configuration management, containers, monitoring, logging, and policies as well as maintenance of the application itself through its entire lifecycle.

I find myself writing a post on just this subject every couple of months, since the landscape changes so rapidly (see my disruption cycle post if you want more on that). That said, recent developments have gotten me thinking again, and I wanted to revisit the different hybrid cloud use cases and suggest a different approach to hybrid cloud that doesn’t force a least common denominator and handles the entire application lifecycle and stack.

So just to set the stage, and make sure we’re all on the same page, let’s start with the obvious - the definition of hybrid cloud and identifying the diversity of use cases.

Few know what hybrid cloud actually means

What the experts say:

In layman’s terms:

Hybrid cloud is, in simple terms, the use of multiple clouds simultaneously, where cloud portability is the enabler of this deployment model.

Where Cloud Portability Comes In

Cloud portability is the ability to run the same application on multiple cloud infrastructures, private or public. This is basically what makes hybrid cloud possible.

The distinction between the two is important, as in the case of hybrid cloud we’re talking about multiple clouds attempting to act as one unified infrastructure, and with the second, we’re basically talking about the option to run on multiple clouds, but not necessarily at the same time.

This post will dive into use cases for both.

Cloud portability use cases

We often tend to associate cloud portability with cloud bursting, and many times it’s even used erroneously interchangeably with hybrid cloud, but these only represent a couple of use cases, and, truthfully, not even the most common ones. In fact, the need for cloud portability spans across a vastly wider number of use cases that are much more common, but at the same time less known, ironically. Let me explain with the following:

Future proofing - With the uncertainty around VMware or OpenStack and the emergence of a new class of cloud native infrastructure it has become abundantly clear that we're going to continue to experience more disruption on the infrastructure level. In order for the strategy to “future proof” your application from those changes and keep your options open to benefit from new developments as they happen, it is important to decouple the application from the underlying infrastructure by designing the application for cloud portability.

Application deployment portability - Many software vendors that develop software applications need to allow application portability to give their customers a simple way to provision and deploy their software products on their cloud of choice. In this context, cloud portability can be analogous to operating system portability between Windows, Linux and Mac or even mobile app portability across iOS and Android. Cloud represents a market and by designing for portability you maximize the reach of your products to those markets.

Same application across multiple clouds - The previous use case describes a situation in which we allow portability at deployment time, i.e. users are able to choose the target environment for deploying their application, but once they have completed the moving process from that environment, it would be considered a completely separate deployment.

There are a number of cases in which the same application would need to span its resources and services across multiple clouds at the same time. Here are a few:

Cloud Bursting - Probably the most common use case for spanning application resources across clouds is known as cloud bursting. This use case is aimed to handle the need for on-demand access capacity and optimize the cost of those resources by allowing to run on a fixed pool of resources during the steady state periods and span to on-demand cloud resources during peak loads.

Migration - Another lesser known use case for cloud portability is cloud migration. A common example would be an organization that is migrating from their VMware environment into OpenStack or from private cloud into public cloud. In this case, portability allows you to smooth out the process and reduce risk by providing a common management layer across the two environments, thus allowing the organization to selectively transition the application between the two environments while at the same time manage them as one.

Portability between the same cloud versions - Another lesser known, but probably the most common, cloud portability use case is the move between versions of the same infrastructure. One of the common strategies to allow upgrade of the infrastructure is to create new instances (cloud sites) of a newer version, and then gradually transition apps onto this new version. Cloud portability make that process simpler as it decouples the application from the infrastructure, and in this way from the changes between versions.

The least common denominator approach

The number of use cases that would benefit from cloud portability could be fairly vast. As noted above, though, the reality is that many of the existing solutions for cloud portability are fairly limited and are not well suited to fit into all of those use cases.

One of the main reasons for this is that most solutions take a least common denominator approach in which they rely on a common layer of API abstraction (mostly around the compute API and to a lesser degree storage and networking) across all clouds and by doing so force limited use of the underlying cloud infrastructure.

Cloud is much more than Compute, Storage, Network

The common API abstraction already limits itself to Compute, Storage and Network and even at that layer the abstraction tends to be fairly simplistic and quite often doesn't expose many of the more advanced features of the underlying infrastructure, and there are many exciting features constantly being rolled out in the cloudsphere.

In addition to features, cloud infrastructure today provides a rich set of services such as database services, analytics services, LBaaS, you name it...the list goes on, that just cannot be easily abstracted.

The result is that relying on this layer of abstraction comes with a high toll of compromising on the least common denominator, one size fits all model, and thus losing many of the benefits that modern clouds provide today. And we are rarely one size fits all.

In my next post, I’ll dive into how to achieve true cloud portability without forgoing all of the benefits hybrid cloud deployments actually make possible.

May 13, 2016

Over the past few months I’ve been involved in various forums and discussion on what the right approach should be for achieving a common orchestration modeling language.

I felt that while there’s a growing consensus that TOSCA is currently best positioned to fit the bill, most notably in theNFV space, there are still different views on what should be the right approach to implement the standard. Some view TOSCA simply as an abstraction layer able to wrap any orchestration framework and by that provide a standard modeling on top of proprietary modeling languages.

Some view TOSCA as more than a language or abstraction layer, that represents a whole new set of philosophies on how to truly build portable orchestration.

Regardless of the approach, I think that most people would agree that the main promise of TOSCA is in its portability.

In this post, I wanted to examine more closely which of the approaches is best suited to enable true portability.

The state of TOSCA & the promise of portability

TOSCA is at the center of many of the new open NFV focused orchestration projects such as Cloudify, Tacker, Open-O, OSM just to name a few, and is quickly becoming the clear winner of common orchestration modeling languages in the Telecom NFV domain.

For those choosing TOSCA, one of the primary highlights is its promise of portability through its inherent technology agnosticism, by providing a common modeling language that can be ported across different clouds, and even orchestration frameworks. This much-coveted portability proclamation, at least theoretically, will allow users to define their application once and then run it anywhere. That said, the reality is that most of the orchestration tools that claim to support TOSCA are not really fully compatible with the specification yet.

More importantly the true value of portability isn’t just achieved by reducing lock-in, but also by enabling a greater degree of collaboration - let me explain:

Portability - The big picture

To fully understand the value of having a common standard and a portable deployment model, I would like to use what I believe to be a useful analogy from the manufacturing industry.

Computer Aided Design (CAD) provided a common modeling language to describe the manufacturing of parts. Having a common modeling language enabled much more advanced collaboration in the globalization era. It enabled different companies to define a product design in one part of the globe, and then simply and cost efficiently manufacture it on the other side of the world - exactly as originally designed. This sort of collaboration was the enabler of a revolution in design, and made it possible to manufacture much larger and complex systems such asBoeing Dreamliner 787at scale.

“The 787 program has more than 50 Tier 1 partners located around the world, including in the United States, Australia, Canada, France, Germany, Italy, Japan, Russia and the United Kingdom. There are suppliers to the 787 program in 38 U.S. states, including significant contributions from California, Kansas, Ohio, Oklahoma, South Carolina and Washington. In all, suppliers to the 787 program are located in 19 countries and were selected based on their ability to do the work with high quality, affordability and reliability.”

The IT industry is now undergoing a similar industrial revolution to the one the manufacturing industry underwent couple of decades ago already. As with the manufacturing industry the lack of a common modeling language was a major inhibitor that limited the scale and speed at which the IT or NFV industries are able to collaborate, and consequently innovate.

When we think of portability in this context, it becomes clear that not having true portability is simply not a viable option.

There are two main approaches for handling TOSCA portability that can be classified as follows:

TOSCA Abstraction - In this option we rely on TOSCA as a spec and let each provider implement the spec in their own way.

Common Runtime - The other approach is having a common language runtime in which we use not just a common spec but also a common runtime that implements the TOSCA spec.

To compare which of the two approaches is best suited to deliver the portability promise I would start by looking at how other languages handles the portability challenge.

Lessons from the past on language portability - (C++ vs. Java)

Common language specification - different runtime

C++ is a good example for the first approach. We have a common spec and different providers implemented their own compiler that follows that spec. The result of that experiment is that C++ never really achieved true portability. Only when GCC became the common compiler, did we ultimately get much closer to true portability in Linux.

Common runtime and spec

Java on the other hand was born on the promise of portability - “write once run anywhere”. The reason why Java was much more successful than C++ in this regard is the use of a common runtime (JVM), which provided a common substrate for executing the Java bytecode format and the Java language. On top of this, the ability to define a clear separation between the Java language and the common runtime made the Java runtime a great platform for supporting multiple languages in addition to Java, at a later stage.

What can we learn from C++ and Java ?

TOSCA is a DSL and comes with many characteristics of a true language such as interfaces, inheritance etc. The lessons from the C++ and Java experience is therefore a useful lesson when we look for the right option for achieving true portability with TOSCA.

Based on this analogy, I would argue that if we want to achieve true portability we shouldn’t solely rely on a common spec, but rather allow each provider to implement the spec all while relying on a common runtime that implements the spec. By doing so we ensure that there's going to be one common way to interpret the spec, especially in areas where the spec is vague (and theres are still many of those areas). It is also important though to provide a useful platform to feed the spec with new features, this however should only be after they have been tested and validated. Without a common runtime it’s going to be much harder to experiment with new features before they become part of the spec.

Using a common runtime with other orchestration engines and data models

The challenge with having a common orchestration runtime is that while a common runtime may provide a common way to interpret the spec it may be limited by its ability to support other orchestration platforms. In the complex world of IT we cannot assume that there will forever be a single orchestration engine that will rule them all, and we need to have a way to integrate with other orchestration platforms. Some examples are network orchestration (ODL), data modeling languages such as (YANG) or container orchestration such as Kubernetes.

There are multiple ways in which this can ultimately be achieved.

Using TOSCA as an abstraction layer

In this way, we use can the TOSCA DSL (without the common runtime) and integrate it as an abstraction layer on top of other orchestration API.

The challenge with this approach is that by doing so we can only achieve limited portability as this leaves a high degree of freedom for interpreting the spec which ultimately results in different ways of actually applying portability in the real world. In addition, the complexity of adding TOSCA support for systems that were not designed to support TOSCA adds a fairly high barrier of complexity for integrating those systems. That is because, with this approach such platforms need to implement ALL of the TOSCA features, and not just the relevant subset that maps natively into their platforms.

Extending the common runtime through plugins

The other option would be to open the common orchestration runtime to support other orchestration platforms and data models through a set of plugins.

In this specific case plugins should provide a simple way to map other orchestration platforms as yet another TOSCA type library - a good example to that is the work we at Cloudify have done around integrating Kubernetes into TOSCA, as well as integrating other data models, for example TOSCA/YANG integration.

Mapping the current options for TOSCA Runtime

OpenStack TOSCA parser projects

The OpenStack TOSCA Parser - This OpenStack project maps TOSCA into the Heat orchestration project. In this context, Heat serves as the execution platform. The main limitation of this approach is that Heat itself was designed to be OpenStack-specific orchestration and by tying TOSCA tightly to OpenStack through Heat, we are unable to leverage the TOSCA portability promise. We also inherit other limitations of Heat, which unlike TOSCA wasn’t designed to support the full application lifecycle management.

Project ARIA - which was announced during Mobile World Congress 2016, is a new, open source project that aims to provide a reference implementation of TOSCA and make it easier for both network providers and carriers to embrace the standard through a simple Python library.

Final notes

The IT industry is going through an industrial revolution similar to the manufacturing industry a few decades ago. Without a portable and standard CAD model in the manufacturing industry we wouldn’t have been able to build complex systems such as the Boeing 787 in a way that is economically possible.

Having a similar standard and portable modeling language is as critical to the IT and Telco industry to enable full automation of complex systems, and to open the door for more intelligent self-managed systems, and provide better cost efficiency.

TOSCA provides a standard modeling language that is a front runner in this regard, however to realize true portability we need to learn from the lessons of other languages such as C++ and Java on what should be the best approach to enable this truly achievable. Having a common spec is often not enough as we’ve experienced with C++, and the addition of common language runtime (CLR) as in the case of Java proved to be significantly more successful model.

Standards bodies such as OASIS, ETSI, MEF as well project Open-O and open-MANO (OSM) provides a great platform for bringing vendors and customers together to agree on this, and other supporting standards to make this a true collaborative effort.

I hope that with this post will help to advance this discussion toward a final conclusion.

Kubernetes gained popularity 30% up (from 21% to 27%) over CloudFoundry which is down by 30% (from 23% to 16%) in terms of popularity! And Mesos which is down by 25% (from 16% to 11%) which put Kubernetes the leading platform for container management in OpenStack

Docker Swarm went down in popularity by 80% from 16% to 2%! Cloudify popularity increased by 20%!

Outside the tent

What’s interesting about this list is that none of the application frameworks listed on this survey are specific to OpenStack or even developed under OpenStack's Big Tent.

There are some interesting lessons to be learned in this regard:

Developers tend to choose an application platform that is not tied to a specific infrastructure.

The fact that OpenStack is open source makes it possible to provide native integration to OpenStack by external projects without being part of OpenStack thanks to the rich set of OpenStack APIs as described in this session:How to Develop for OpenStack APIs

The main question that comes from all this is: Should the OpenStack community continue to invest in building application framework projects or simply focus on making the integration of external projects with OpenStack simpler?

Interestingly enough, during the summit, Thomas Morin, Network Architect from Orange Labs, provided great insightinto why they tend to select open source tools from the broader ecosystem rather than limiting themselves to OpenStack projects during a panel discussion on Open Source NFV Lessons Learned from End Users which also included members from ATT, NTT and China Mobile. I think that Thomas' view is a good representation of how most OpenStack developers think when they make their choice of application platforms and tools.

This was also echoed by Mark Shuttleworth, founder of Canonical, in his comment about the future of Big Tent:

“If you look at OpenStack it has this frame big tent, well the truth is the tent will collapse and that's going to be traumatic for everybody,” said Shuttleworth.

This is not to say that the whole of OpenStack will fail, as he believes that open source will continue to grow stronger as an Infrastructure as a Service. The core of network and compute will remain while all of the complexity on top with the as-a-Service components will disappear. (Source)

Cloudify updates from the summit

Anyway, the Austin summit was probably the best event yet for the Cloudify team with more than 6 talks, 12 people on the ground, a book signing, back to back meetings and lots of new stuff - here is a short summary for those who missed the party.

Cloudify was noted as one of the top 5 application and container frameworks according to the OpenStack survey!

The following sessions from the summit provide a good overview about how Cloudify works with OpenStack:

Cloudify integration with Kubernetes - Demonstrates how you can use Cloudify to install Kubernetes on OpenStack and other clouds, manage kubernetes micro-services composition and dependencies as well as manage a hybrid deployment of micro-services and non micro-services applications. We also announced the availability of new Cloudify/Kubernetes as a service

Project ARIA - Simple TOSCA based NFV Orchestration- demonstrating how you can deploy NFV application and service-chaining through a simple python library. See also the ARIA session from the summit here

Cloudify Play List from the summit - We've put together all the Cloudify related talks from the summit on youtube playlist

This week we also announced the NFV Lab on demand which shows how you can get your own private OpenStack environment with Cloudify pre-integrated on-demand and experiment with orchestrating their first NFV service.

I would like to close by thanking the members of the community for their support.

I really enjoied being in Austin again and were now fully energised to continue the OpenStack momentum as a co-organisers of the OpenStack Day Israel event and the first OpenStack Days East in NYC!

April 28, 2016

Preface

The Innovator’s Dilemma faced by vendors of proprietary networking stacks is well documented. The blogsphere and the trade media have well documented the disruptive, one-two punch of open source NFV and white box hardware that threatens to undermine their business models.

But just this week, at OpenStack Summit in Austin, we saw Sorabh Saxena, Senior Vice President of Software Development & Engineering at AT&T showcase on a keynote stage before more than 7,500 people (plus the online audience) just how mature this disruption is. His company, faced with multiplying demand for networking capacity over the next four years, has embraced NFV and white box hardware not so much as an inoculation against vendor lock in (though this is part of it), but as a necessary tool for agility and speed.

And yet, the existential threat to companies like Cisco is present, real and almost certainly the end of an era. In this post, I’ll explain what is happening and what we all should do.

But first, a bit of background.

Open Source has been at the center of some major disruptions in the high-tech industry. Linux changed the entire operating system landscape. Hadoop, MongoDB, Cassandra and now Spark lead the disruption in the way we manage data. Android changed the entire mobile world to name a few.

The networking world has been very late to join this open source disruption and was kept for a long time behind a walled garden. It didn’t take long for the wall around this garden to come down.

We are now facing a time in which open source disruption is hitting the networking world and making big waves in the form of NFV and SDN, and, like other similar disruptions, the first to be hit by that disruption are the incumbents like Cisco.

As in other, similar disruptions, the first reaction by those incumbents is denial which is later followed by other tactics such as camouflage (if you can’t beat them, join them). But, at the root of many of those disruptions sits the big elephant in the room. Those publicly traded companies have an inherent conflict of interest with this open movement that is rooted in their business and revenue model. A conflict that can’t be bridged easily just by sponsoring or even contributing to open source projects.

As with previous disruptions, this revolution will spawn a completely new generation of companies that was born to lead this disruption.

For the past three years, I’ve been actively involved in this movement and I’m still seeing lots of confusion that mostly comes out as fear of change by some carriers who want to join the movement but don’t know how.

So, for this post, I wanted to share my perspective on how carriers should respond to this change and avoid the risk of moving too late to a technology that is already fundamentally transforming telecom.

I’ve chosen Cisco as the representative of the incumbents, although their use case can be applied to the stories of other vendors of proprietary networking stacks.

Open Source NFV Initiatives Gets Real

The NFV open source movement has gained serious momentum over the past two years, with OpenStack being the main driving force behind this, by providing an open source substrate to sustain the NFV movement.

This has led to an influx of projects with a more specific NFV focus that have emerged over the past year, such asOSM which is led by ETSI, as well as OPNFV and Open-O which are under the auspices of The Linux Foundation.

Other projects such as Tacker (a Brocade-led NFV orchestration for OpenStack, project) and ARIA (reference implementation for TOSCA) have also been created to support those initiatives. The fact that open solutions for complex, full-fledged NFV environments are even available on-demand, demonstrates the speed at which the NFV revolution is moving.

Open Source Reshapes the Way Standards are Being Defined

What’s interesting is that standards bodies such as ETSI and MEF, which traditionally used to work in long cycles and in fairly closed discussion groups, are now redefining themselves to fit into the open source paradigm, and are embracing open source technology as a means to drive new standards.

The MEF, for example, will even be running an open Hackathon around a new flagship initiative, LSO (Lifecycle Service Orchestration), as a way to define their new API. Clearly, this is a big change in the way standards are defined in the telecom world.

AT&T Leads the Open NFV Agenda, a Tectonic Shift

AT&T recently announced their Enhanced Control, Orchestration, Management and Policy project (ECOMP) during the the Open Networking Summit. The project proves that open source can actually support the core backbone for one of the biggest carriers.

At the OpenStack Summit yesterday, AT&T won the OpenStack SuperUser Award for their contribution - the announcement includes fascinating statistics that show the scale at which AT&T invested in adopting an open source based stack, which includes 70+ OpenStack deployments with plans for 90% growth. They also trained more than 100 of their employee to learn how to work and contribute to OpenStack.

This, on top of the Orange Labs functional testing of OPNFV (although much smaller in scale) - another great use case that was presented on OPNFV last year - highlights another big shift that is happening as a result of this open source movement. The Orange Labs example illustrates how open source enables telecoms to build their own infrastructure without being dependent on the “high touch” vendor dependency of yesteryear.

Open Standards for NFV Orchestration and Modeling Gets Wider Acceptance

TOSCA is at the center of many of the new open orchestration projects and is becoming the clear winner of common orchestration modeling languages. Project ARIA, which was announced during Mobile World Congress 2016, is a new, open source project that aims to provide a reference implementation of TOSCA and make it easier for both network providers and carriers to embrace the standard through a simple library.

YANG is also known to be a popular modeling language for defining networking devices widely embraced by the networking community. Until recently, there was a big debate in the telecom world whether YANG should also become the standard modeling language for orchestration. The ability to combine TOSCA and YANG integration is gaining wider acceptance now, as this approach seems to provide the best of both standards - where TOSCA is responsible the service lifecycle and YANG controls the network configuration of the VNFs.

The Open Source Disruption in NFV and Cisco’s Built-in Conflict

Cisco is one of leading network providers and it is no surprise that NFV needs to fit into the center of its strategy. Cisco is also making fairly big investments in aligning itself with the open source movement, specifically around OpenStack, but there's one issue with this that stands out.

The Elephant in the Room

A quick look at Cisco’s revenue model for 2015, outlined here, reveals an internal conflict with their joining the open source movement (the 2016 reports shows a similar breakdown).

We can clearly see that Cisco’s primary streams of revenue come from selling network devices, and therefore moving to a software defined networking model would come at the expense of their current physical device business model.

Selling open source based solutions drives a completely different business and revenue model, and, despite all the effort that Cisco has invested in open source, these will never serve as a new growth engine at the levels that Cisco used to sell before.

Another point that is important to note in this regard, is that open source often comes with a demand for an open ecosystem by the users, i.e. carriers, in this case.

Cisco, on the other hand, was used to “playing solo” when it comes to networking, and their solution was built as a full Cisco stack. Cisco even went as far as buying companies that led new standards such as tail-F that led the YANG model & support.

Obviously, this model doesn’t fit well with the the open movement, in which carriers are looking for flexibility to choose their own stack, and not being dependent on a single vendor.

This puts Cisco in serious conflict between where the market is heading and its current business model. For publicly traded companies that are measured by their quarterly bottom line, it is hard to see how Cisco could make a shift into this new world without a complete shakeup similar to the one we've seen recently with Dell & EMC.

Cisco is Not Alone

While I chose Cisco as a means to demonstrate the dissonance faced by leading networking vendors in the wake of an open source revolution, they obviously are not the only player that will suffer a big hit by this disruption at the heart of its core business. Other players such as Ericsson, Nokia, and Amdocs, similar to Cisco, in being used to selling proprietary stacks and turnkey solutions, are also exposed by this disruption.

Maturity is Not Enough

The only advantage that Cisco and the other “high touch” players have is the maturity of their stack and brand.

Indeed one of the leading current deficiencies with choosing the open source route right now is the maturity of the stack, as many of the open source players are still fairly new and there is no such thing as a “fast forward” on maturity.

Open source solutions are also often times built out of many moving parts, which requires a completely different skillset and organizational structure for adopting and embracing an open source strategy.

Having said that, it is clear that if we follow the maturity argument on any of the examples that I mentioned before, we would have chosen Oracle over Hadoop or MongoDB, or Blackberry over Android we now know clearly that those who have taken that route have lost the game completely.

In addition, maturity can be a fairly tricky thing as many of the “mature” solutions are, by definition, built out of old architecture and concepts. So, by choosing those solutions you're basically betting on building your future stack on technology of the past. That’s not going to work - sorry.

AT&T serves as a great example of how telecom organizations can overcome this limitation, by building an infrastructure and an organizational structure that fits with this open movement.

Final Notes - The Blackberry vs Android Dilemma in NFV

Many of the carriers are now evaluating their first steps toward NFV. For many of them this journey looks scary, as they have gotten accustomed to being “hugged” by the big vendors, and all of a sudden need to take ownership of their own stack.

At the same time, starting a new NFV initiative using old habits is worthless, as it will probably cost even more than the current system.

Instead, carriers should learn a lesson from previous disruption, follow in AT&T’s footsteps, and be careful not be tempted to buy into the “big hug” from their old partners who now find themselves in a conflicting position with their business goals.

Let me explain what I mean by lessons from previous disruption:

Learn to work with new vendors - Carriers should be ready to work with new players that come with open source DNA and open source business model. While many of these players may be new to the market, and less mature by the old-world standard, they have much stronger alignment with the carrier's business goals and could play better as a “change agent”.

Partnership with multiple vendors vs turnkey solution by a single vendor - Carriers should adopt a partnership strategy with their vendors and less of the old customer/vendor relationship, in which the expectation was to find a single vendor that will deliver a turnkey solution for all their needs. To do that, carriers need to adopt an integration strategy to allow faster adoption of new technologies by different vendors as well as simplify and shorten the business engagement process, and with that, reduce the barrier for new vendors thus increasing the competition and reducing their costs.

Control your stack and cost - In this world, carriers need to also have that development skillset in order to have much stronger ownership of their stack and therefore cost.

Embrace webscale best practices - The shift toward cloud-based infrastructure requires a shift toward a cloud-native strategy. Rather than reinventing the wheel it’s best to learn from the best practices of other industries that have made the shift toward cloud, such as Netflix, and embrace webscale best practices such as continuous development, automate everything, containers etc.

A Path Forward—What We Do Next Matters

The bottom line of all this is that carriers that want to survive this disruption have to learn how to control their stack as this is the only way in which they could control their costs and speed of innovation. Without it, they stand no chance of winning at this game.

Unfortunately, there's no shortcut and this move is going to be painful—but we could still make it smoother by introducing new ideas to shorten the learning curve such as the new NFV lab and making the product simpler to use as well by better aligning and collaborating with others who face similar challenges, even if they are competitors, because the challenge is far greater than one organization can solve.

Major consumers of networking stacks have shown us a path forward. Forged in the hackathons of open source projects, and powered by the silicon of white box ODMs, NFV is no longer an idea with a few, isolated test cases. As AT&T showed yesterday before an audience of thousands, the technology is mature enough to deploy globally in dozens of data centers, forming the framework of networking plant that will manage network traffic for, arguably, the world’s most trusted brand in network performance.

So, what should we do? We should do all we can to join hands with telecom operators worldwide to advance these open source technologies. Incumbent vendors have nearly zero incentive to do so. If we want to be a part of the revolution that will redefine networking, we need to help the side with the most to gain from success.

Let’s not set ourselves up for a similar destiny to that of Blackberry, or Nokia for that matter.

December 13, 2015

While this has been said many times, before, I believe that 2015 will likely be remembered as the year of the cloud, largely due to the most notable event of the year* (read: decade/century…), the whopping Dell/EMC deal, which clearly marks a change in era. This is a blatant demonstration that literally all the traditional IT players are now fighting for their survival, as I noted in a post titled, “The Disruption Cycle: A Dime a Dozen”.

This (r)evolution puts OpenStack in a fairly interesting spot, as many of these “enterprise” players - from Cisco to HP to IBM, are also some of the strongest driving forces behind OpenStack. New players such as Mirantis have established themselves as the *new* leaders with a second Intel investment of yet another $100M! While Red Hat is still big in the game and is pretty much using its Linux distribution as a way to lock in its existing customer-base.

We have also seen new emerging startups such as Platform 9 and Stratoscale that are now aiming to disrupt not just the traditional incumbents, but also the new leaders i.e. Mirantis and Red Hat,... So bearing all this in mind, it definitely makes for a very interesting scene for my annual prediction exercise. :)

Before I jump into 2016 predictions let me start with a quick recap from my previous 2015 predictions.

My first prediction was that one of the core services in OpenStack (and any cloud), the compute service will go through a massive transformation from being hypervisor-centric to a shift into the container/bare metal combo.

Indeed, according to the latest OpenStack user survey a total of 31% are using a combination of bare metal, LXC and containers. That move became the main news for the OpenStack Tokyo summit, i.e. the fact that OpenStack can support all kinds of flavors of compute resources is clearly one of the biggest differentiators in the cloud space. This also fits well with Rob Whiteley's, VP of Marketing at Hedvig, 2016 predictions which anticipate that Docker will become the #2 hypervisor in OpenStack.

In addition, the 2015 report saw the rise of NFV popularity within the OpenStack community, that is also part of a bigger convergence between Telco and Enterprise IT.

The latest survey shows that containers and NFV are now considered the top areas of innovation within the OpenStack community even more than a PaaS (don’t worry..I’ll come back to that shortly...).

2014 was the year where Docker burst onto the scene, and basically disrupted everything in its path. It became too big for it to be controlled by a single owner. The latest move by Google and CoreOS in April of 2015 forced Docker to loosen its control over core engine behind Docker and open the rest of the stack to plugins in order to avoid competition from its own ecosystem. In June 2015 Docker announced that it established an industry coalition under the Linux Foundation.

In retrospect I think that my 2015 predictions weren’t a wild guess after all, which brings me to the topic of this post - 2016 OpenStack predictions. In the paraphrased words of Adrian Cockcroft, “follow developer adoption not IT spend. If you're looking at spend over adoption, then you're looking back in time”.

As in the previous case, I will use the OpenStack user survey in conjunction with a few other reports and predictions as the main basis for my analysis. (The list of resources is provided at the bottom of this post).

2016 OpenStack/Private Cloud Predictions

The elephant in the room -

I remember the days when OpenStack just started, and we used to compare it with the Linux project. Indeed there are lots of similarities - both are big open source projects that falls right in the heart of our data center architecture. After six years on this journey, I realized that this analogy may have a significant flaw. A cloud is a significantly more complex beast than an operating system, and the biggest challenge isn’t necessarily even a technology or cost challenge. In order to be successful in the cloud business you have to have the skillset and organizational structure and culture that is much closer to that of Amazon, Google and the likes. Clearly most of the enterprises, including those who run big data centers, don’t have that skillset and culture, and more importantly the ROI for building one isn’t clear

.

What that means is that the entire assumption behind building an OpenStack distro follows the same Linux distro model may be completely broken at its core. It assumes that enterprises have IT that will be able to operate the cloud by taking a distro and managing it in typical IT fashion with vendors like Red Hat, Ubuntu, Mirantis backing it up with support. The reality though, is that most IT organizations are not sufficiently skilled for such an undertaking, and the effort of transforming them into a cloud business doesn’t seem to have an apparent ROI behind it.

This was echoed by other predictions, such as the aforementioned Rob Whiteley predictions: Talent, not technology, is the No. 1 inhibitor to OpenStack success.

Another interesting indication that may point to this flaw is the declining number of small organizations that are adopting OpenStack vs. a growing number of large organizations, as demonstrated in the diagram below:

Given the current barriers for building any private cloud (OpenStack included), this trend makes sense - i.e. for small-sized organizations, it will be more economical to use a public cloud or a simple Docker-based infrastructure than investing in OpenStack. What’s more, is that even large organizations are still running fairly small-sized deployments of OpenStack as indicated in the diagram below, which makes the argument for OpenStack ROI questionable even for fairly sizable organizations.

In the immediate/short-term that would mean that organizations will have to rely on others to provide a fully operational OpenStack environment. That means that we will start to see a bigger shift from delivering distros, to pre-packaged appliances that come with an opinionated version of OpenStack coupled with its hardware. Indeed Mirantis, HP, Cisco are leading the trend in this regard alongside new startups such as Stratoscale.

I expect that we will also see more organizations relying on a hosted version of OpenStack, in which case they will outsource the entire private cloud operation to external experts in this domain. RackSpace alongside traditional system integrators such as CSC, Accenture et al, will be leading this category.

Having said all that, the above solution may address the skill set challenge required to operate an OpenStack cloud, but it doesn’t come close to addressing the issue of why you would want to run OpenStack in the first place, which is often to increase business agility and reduce costs. In order for OpenStack to remain a relevant proposition, it needs to go beyond the boundaries of its software distribution and combine other low cost clouds. Which leads me to the next prediction...

Prediction 2 - The rise of new OpenStack platforms which use OpenStack as a universal abstraction on other private/public clouds

VMware has announced its support for OpenStack with VIO. The interesting thing about VIO is that it uses OpenStack as an abstraction layer on top of the traditional VMware stack, and thus allows VMware users to move to OpenStack while leveraging maturity of the VMware infrastructure. It will also allow those users to leverage their existing VMware skill set and this makes the transition to OpenStack from VMware a simpler task.

Platform 9 is a new and promising startup that offers an OpenStack service on top of Amazon, or any other cloud resources. What’s interesting about the Platform 9 approach is that it also reduces the barrier for running OpenStack significantly, and in addition, it leverages the maturity and cost benefit of other clouds such as AWS, GCE, VMware as the underlying cloud resources, and thus uses OpenStack as a universal abstraction for all those clouds.

Prediction 3: PaaS is Dead. Long live the new PaaS

Docker provides a ubiquitous deployment model which makes the traditional opinionated PaaS offering less attractive as I noted in the post “Do I Need PaaS if I Use Docker?. The recent OpenStack user survey shows that Kubernetes which was designed purely around Docker and microservices is gaining momentum quickly, alongside Docker Swarm, Mesos and Cloudify. What’s interesting about this list is that all of them represent a new and fairly different approach for application management and automation to that of the “Heroku” style PaaS. CloudFoundry is still in the lead with OpenShift coming in a close third, however both CloudFoundry and OpenShift had to pivot their entire product to fit the shift to containers. The most notable one is OpenShift that have been re-engineered around Kubernetes.

Another interesting observation about this list is that none of these projects is part of the OpenStack distribution. One possible explanation for the latter is that users tend to decouple their application platform from that of the infrastructure.

I expect that in 2016, all of the application platforms will have support for containers and Kubernetes. I also expect that Kubernetes will take a clear lead in this category. Mesos and Kubernetes have largely been considered complementary to one another so far. I expect that as the two platforms continue to grow up the stack the level of overlap between the two platforms will make them more competing alternatives than complementary ones. The uncertainty around Pivotal due to the Dell/EMC deal, along with the rise of Docker/Kubernetes will be the catalyst for a swift takeover of the adoption of Pivotal/CloudFoundry. I believe OpenShift is in a better position in this regard as it has already made moves to align itself with Kubernetes. Having said that, it remains to be seen whether the additional added value that OpenShift covers on top of Kubernetes is worth the risk of adding another layer of abstraction.

While we can see clear indications for widespread adoption of containers, I expect that the transition to a fully containerized world will take good couple of years, and even more so for microservices. I, therefore expect that in 2016 most of the deployments won’t be able to run the entire workload in microservices/Kubernetes style and would therefore, end up with a mixed workload environment. A good example for this is the use of Kubernetes as a deployment target for microservices and other orchestrations for running the database, big data, event processing and more complex workloads. Cloudify and OpenStack/Magnum are being positioned as the integration platforms that will allow the gluing of these different kinds of tools and workloads together as mentioned in this post Cloudify Meets Kubernetes

Prediction 4: Telcos are going to lead the adoption of OpenStack simply because they have no other choice.

While the case for OpenStack ROI in the enterprise domain is being challenged by public clouds for telcos, OpenStack is probably the only possible option as they are in direct competition with many of the public cloud providers. That’s why we're seeing a significant rise of Network Function Virtualization among OpenStack users as noted in the graph below:

I expect that by 2016 telcos will be the clear leaders of OpenStack deployments over enterprises.

Prediction 5: 2nd generation of OpenStack? Using Docker natively with OpenStack will require more fundamental changes in the OpenStack core.

The approach for integrating Docker into OpenStack so far included the addition of Docker as a compute node, as well as adding support for the layers up the stack such as Kubernetes, Mesos using Magnum.

While this seem to be the right first step, it misses some of the fundamental values of containers which offer a complete alternative compute, network and storage stack.

To fully embrace containers, OpenStack will have to go beyond the current OpenStack core and think of a potentially new OpenStack core that will be designed with native support for containers. Red Hat seem to be ahead of the curve in this regard, and are offering a container native stack that has been announced in Tokyo. (Note that in this announcement there's very little correlation to the current OpenStack services). I believe that in 2016 we'll start to see a bigger push toward native integration of Docker into the core of OpenStack. I believe that the outcome of this thinking will set the groundwork for the next generation of OpenStack which will take a container native approach and will therefore become simpler and lighter than its predecessor.

Prediction 6: Hybrid cloud takes center stage (finally)

When OpenStack started, it was thought that it could be an alternative to AWS and other clouds.

In reality however, OpenStack is always used in conjunction with other clouds, the most popular one being VMware on the private side, and AWS on the public cloud front.

Having said that, most users have taken a fairly loose approach toward hybrid cloud in which they were operating OpenStack, VMware or Amazon completely differently with very little integration between the two environments. Part of the reason for that approach was that in reality building a hybrid cloud deployment was fairly hard.

There are three things that change this:

Tooling - new orchestration frameworks and network virtualization now make it possible to automate the process of deployment of applications across multiple clouds.

Containers provide portable packaging units which makes the ability to run the same software package on multiple clouds significantly simpler than before removing the dependency on a proprietary image format.

Built-in support for containers by most major clouds provides a common substrate that makes it easier to move workloads between one cloud to another.

Another strong motivation for using a hybrid cloud strategy is the ROI. As I mentioned earlier, the cost for building OpenStack can be fairly high. One of the ways to reduce the cost of OpenStack infrastructure is by integrating OpenStack with other public clouds, which enables extensibility of private cloud instances with on-demand instances, and thus increases the business agility and reduces the overall cost.

All of the above leads me to believe that in 2016 the majority of OpenStack projects will take a tighter hybrid cloud approach as a barring requirement.

The use of containers and Kubernetes, in conjunction with the support of TOSCA by OpenStack, as demonstrated through the TOSCA Parser project, coupled with Cloudify will lead the way in this regard.

Final Notes: The only constant is change

In this dynamic world predictions become a fairly challenging task.

What I like about OpenStack is that I feel that it is certainly getting better at the way it listens and responds to its users demands. The investment in the OpenStack user survey is evidence to that. The latest user survey is by far the most comprehensive and insightful report. Kudos to the Foundation for a great delivery!

In this prediction exercise I relied on the user survey along with other prediction reports that I’ve listed below and hopefully the result of this process will yield more accurate predictions.

Regardless of all this, what’s clear is that the only constant is change - so if I have one piece of advice to give it would be - embrace change. When planning your 2016 OpenStack strategy make sure that your choices will allow you to adopt new technology faster, as your speed of innovation is going to be determined by the speed of adoption.

October 22, 2015

Before the dust even settled around the whopping Dell/EMC deal, another bombshell was dropped with the Red Hat acquisition of Ansible over the weekend. While I was still processing my thoughts on what I think the Dell deal means for the enterprise IT industry - I had to make an abrupt about-face and think about this poster-child of the community and bottom-up startup approach of Ansible means for the IT industry as a whole, and how this may be an even more substantial win.

The EMC/Dell deal seems to be centered more on the merits of a financial transaction, i.e. grabbing billions of dollars from the existing customer base, refunding debts, latching onto VMware as the only publicly traded company and so forth. The following diagram summarizes this quite nicely:

What seems to be clear is that while the EMC/Dell deal is good for its shareholders its equally bad news for their customers as noted by the David Vellante report on Wikibon as both companies will have to squeeze their customer base to justify the ROI for the deal.

It's hard not to notice that there's nothing in this deal that tells a clear story on how the new company will transform into the new cloud/web-scale business model.

So if anything I think that the Dell/EMC deal marks an end of an era more than a new beginning as Cade Metz puts this nicely on wired.com

“HP. Cisco. Dell. EMC. IBM. Oracle. Think of them as the walking dead. Oh, sure, they'll shuffle along for some time. They'll sell some stuff. They'll make some money. They'll command some headlines. They may even do some new things. But as tech giants, they're dead...When someone asked what we should call that IBM-HP-EMC-Dell-Cisco merger, his response was wonderfully descriptive. He suggested we call the company Fucked By The Cloud.”

It's also interesting to read the reaction to this post on twitter :100:

The case for Red Hat/Ansible is the exact antipode of the EMC/Dell deal. The financial factor didn't really play any real factor at all, what really mattered is the developer mindshare, adoption rate and popularity of Ansible in the DevOps community.

The Ansible stats tell the story - plain & simple - high adoption rate, low revenue, and I would add, in essence a high adoption rate at the expense of revenue.

The ROI on this deal is a no brainer for Ansible. The rationale for the Red Hat acquisition isn't so much on the direct revenue from Ansible as a product, but rather the potential monetization from its impressive adoption. This is interesting as it clearly put the benchmark for a successful startup these days back on eyeballs and popularity, and less on having a sustainable business model. But this seems to be the new game, and technology companies are all scrambling to get in line with this new model.

What's also interesting is that we’re starting to see new companies emerging to disrupt not just the dinosaur of the world, but also the early cloud companies such as Puppet and Chef that are now being disrupted...and there's a long list of such companies. Salesforce is a good example for one that is also being threatened by this rapid disruption.

We can see a clear vicious cycle that applies to all companies old and new. New generation companies rise on the wave of adoption by providing free and open source tooling, as they start to monetize and build a sustainable business model they start to lose popularity, and a call for a new disruptive tool is heard.

That explains the timing behind this acquisition, Ansible had reached a ceiling when it comes to adoption, and failed to build a sustainable business model out of their product. Containers/Docker have disrupted the entire configuration management market, and to a large degree render it obsolete (the adoption curve shows a clear declining trend on all the other configuration management tools).

With this in mind Ansible is clearly at its peak right now, so their best option was to be acquired by a bigger company like Red Hat. Red Hat is probably the only company with a sustainable business model around open source, so for them the direct revenue from Ansible isn't that interesting but more of the combined revenue that could be generated by bundling Ansible into a bigger DevOps tool set. I'm sure that the fact that Ansible’s founders are also formerly Red Hat played a not small factor in the overall decision.

Final words

What have we learned from all this?

Cisco, EMC, Dell, HP, IBM are “walking dead”

Adoption/popularity is the new king

The only constant is change!

Disrupt yourself before your competitors do --> Embrace change

Steve Jobs’ main mantra was "disrupt yourself before your competitors do" and indeed what's clear in both deals is that change is inevitable, and it can and will happen to big companies just as it can happen to a smaller company. Therefore, the assurance that we used to think we had by relying on big companies isn't a safe bet anymore, similarly the risk of using new technologies from startups isn't as great as it used to be.

In this world those who would survive are only those who are set to adopt new technologies fast. There’s a direct correlation between the speed of adoption and the speed of innovation. The challenge is actually even bigger for companies who grow fast. Chef and Puppet are a good example in this regard. They created a new disruption, grew fairly fast, but stagnated for a long time, and failed to respond to the new disruption in their domain.

This is where I see the biggest shift in the value chain specifically in the new data center. The value is less on how we build the tools, but more on how we put them together, and how we can adopt new tools or replace existing ones fast in way that will allow us to always be ready for the next disruption. This explains the rise in orchestration tools, as I pointed out at the last OpenStack Summit, and in my previous post orchestration tool roundup. that plays an important role as a "gluing" mechanism in the area of DevOps and automation.

This achievement demonstrates how a public cloud provider can embed Cloudify as an integrated orchestration service, including the ability to provide a custom user interface for both the web interface and command line interface, as well as an authentication service and turn it into a native extension to its existing cloud service. While the service is built on Cloudify, we were able to extend it to expose some of the vCloud specific services, such as the networking (NSX) and Database services, demonstrating that Cloudify doesn’t force a least common denominator. What’s more, because the Blueprinting Service is built directly into vCloud Air as an embedded service, it does not require any setup or installation by end users.

What is the difference between Cloudify and other cloud orchestrators?

Cloudify is an open source, TOSCA-based orchestration platform. That means no vendor lock-in with the ability to automate deployments on multiple cloud infrastructures. Moreover, the fact that its the only open source orchestration that integrates natively with the entire VMware stack allows users to customize their DevOps environment any way they like on their VMware environment.

Extend to your private cloud

Users can download Cloudify into their own environment and use the same orchestration blueprint template to automate their applications on their private vCloud, vSphere or VIO environment. Similarly vCA users can also use Cloudify-vCloud plug-in to create their own custom blueprinting service and take advantage of additional monitoring, logging, auto-scaling, and self-healing features under their vCA environment.

A summary of the key features is provided below:

Key features

Available as an embedded service with vCloud Air,free for vCloud Air users

Zero footprint - the service doesn’t install an agent or other resources on the application environment and doesn’t require any software download and setup

Supports vCloud Air services, such as networking and database as a service

Extensible through a simple script-based plugin

Includes built-in service discovery to handle service chaining and dependency injection

Full application life-cycle management

Designed for web-scale

Extend the Blueprinting Service with Cloudify Premium to create your own custom blueprinting service on vCloud Air or bring your vCA environment into the private cloud and add extended workflow, monitoring, and auto-scaling capabilities.

Demo

Below is a demo that shows how to create and deploy an application using the TOSCA blueprint on vCloud Air.

vRealize and Cloudify

The vRealize Suite is a cloud management platform that provides a comprehensive management stack for managing the entire VMware product portfolio. Paired with Cloudify, vRealize users can not only manage their VMware assets, but they also can manage popular DevOps tools such as Kubernetes/Docker for containers, Chef, SaltStack and Puppet for configuration management, and tools like ElasticSearch, Logstash, influxdb, Grafan for logging and monitoring as well as a and Fabric. In addition, Cloudify can use the information from vRA-OPS to correlate the state of the application with the state of the infrastructure to optimize resource utilization and ensure that the application meets its desired SLA.

Wait… There’s More!

Cloudify is not just an integrated service. Our Premium Edition offers the ability to extend your vCloud environment into a private vCloud, vSphere, or VIO environment as well. To learn more about that, see our Premium page.

And stay tuned the rest of the week as we expand more on our adventure of building Cloudify into an Orchestration-as-a-Service.

The discussion was centered around the Forrester report OpenStack Is Ready — Are You?, an excellent research piece led by Lauren as well as on Kristian's talk “45 Minutes of OpenStack Hate” in which he laid out his challenges from his experience in building a web-scale service using OpenStack.

Bringing both Lauren and Kristian onto this podcast helped to paint a fairly accurate picture of where OpenStack is today and what it really takes to make it enterprise-ready. The ability for a community and a research firm like Forrester to run such an open dialogue, exposing not just the positive side but also many of the challenges and difficulties involved with the adoption of OpenStack, is a sign of maturity in itself.

When we finished the discussion I couldn't avoid thinking about what seems to be a serious paradox with the way enterprises are adopting OpenStack - let me explain.

In the diagram below which reflects the results from the user survey, it's clear that 82% of users use OpenStack in conjunction with other clouds.

Similarly according to the Forrester report, enterprises choose OpenStack to enable interoperability and avoid lock-in:

An increasing number of large enterprises are seeking open source technology to launch this transformational journey. The goal is to avoid vendor lock-in and mitigate expensive licensing costs.5 Others see it as the promise of portability and interoperability of applications embracing a “design-once, run anywhere” solution — a reality that hasn’t come to fruition yet.6

At the same time the reality shows that OpenStack is used primarily to drive only new cloud initiatives as noted in the Forrester report:

OpenStack isn’t your only private cloud or virtual environment designed to be your orchestrator across your traditional workloads.3 Rarely does one place OpenStack in front of legacy or traditional workloads in lieu of a proprietary private cloud suite. In reality, OpenStack sits behind net-new environments designed to launch your enterprise into a revolutionized continuous development experience.

The OpenStack Interoperability Paradox

One of the main reasons enterprises choose OpenStack is to enable better interoperability and portability, however, the reality clearly shows that OpenStack is not there yet. If I add to that the current maturity of OpenStack, it becomes clear that there are some more fundamental issues that OpenStack needs to deal with to better fit the enterprise environment, making the chances that it will be ready to execute on its "design-once, run anywhere” promise a luxury at this point in time.

Bridging the OpenStack Interoperability & Portability Gap

The good news is that we don't have to wait for OpenStack to solve the portability and interoperability issues. This is where the ecosystem can be fairly handy.

As I laid out in one of my previous posts, Cloud Migration in the Enterprise, there are a couple of techniques to address the interoperability and portability challenge starting from nested virtualization through API portability, through automation and orchestration. Quite often it would be necessary to combine a few of these techniques to achieve the best results.

In this post, I wanted to touch specifically on the portability between VMware/OpenStack environments, as this is probably the most common use case where such portability is needed.

VMware announced last year that it will join the OpenStack distribution war with its own OpenStack distribution on top of VMware infrastructure. This year they will be announcing VIO v2. This solution provides portability of the technology stack - i.e. if I'm a VMware user and I want to move to OpenStack to reduce my chances for lock-in, I can use VIO as a more gradual step in this direction. This will allow me to leverage the maturity of the VMware technology stack as well as the existing skill set that I developed in my organization to make this transition smoother.

As Lauren mentioned in the podcast, a large part of VIO is still OpenStack, and many of the pitfalls that were noted as a maturity gap lay within that layer, not at the virtualization layer or lower infrastructure layers. This basically means that I would still face many of the OpenStack maturity gaps if I choose to go down that path.

In addition, while this solution provides portability of the technology stack, it doesn't allow me to take my existing application running on a VMware stack onto VIO, meaning I still have to take care of the portability at the application layer.

OpenStack & VMware Portability Using TOSCA-based Orchestration

Another approach to cloud portability is to use orchestration as an abstraction layer between the VMware and OpenStack environments. The main advantage of this approach is that it can work natively with each environment and provide abstraction at the application management layer. That means that I can "templatize" or "templatify" my application using TOSCA and deploy it on either OpenStack, vSphere or vCD etc. The orchestrator takes care of mapping this template into the underlying environment infrastructure.

Looking into the Future - Interoperability with Containers and Kubernetes

It is clear that containers are going to be the dominant layer for cloud native applications.

Container architecture provides a great way to simplify the portability task for those applications simply by the fact that the application packaging format is portable between OpenStack, VMware as well as other clouds.

While containers solve the challenge of application packaging portability, most of the solutions that rely on containers work on an environment that relies completely on containers and don't fit as well with more heterogeneous environments in which containers are part of a more heterogeneous blend.

To address this gap we can use the same TOSCA-based container orchestration approach to extend our OpenStack or VMware environment into the next generation container-based environment, as outlined in the diagram below. What's interesting is that we can also deploy the entire stack on plain bare metal environments in just the same way.

You can read the full details in Dewayne Flippi's post which includes an example for provisioning Kubernetes on bare metal - "Cloudify Meets Kubernetes - Container Management & Orchestration on Bare Metal".

Worth Noting New Portability Features

We've positioned Cloudify as pure-play, open source orchestration. I believe that right now, it's the only open source orchestration that integrates natively with OpenStack and VMware. There are a couple of things that we've been working on lately to make this interoperability story stronger. The first thing to note is that we've released ourvSphere plugin as part of Cloudify 3.2.1. which supports both vSphere 5.1 and 5.5. We also made a major update to our vCloud support which is a result of a long joint effort with VMware - expect a big announcement on this front next week at VMworld!

Wer'e also making many efforts to make it easy for users to leverage our TOSCA engine independently from Cloudify. We'll be announcing more on this front on the upcoming OpenStack Summit in Tokyo.

June 21, 2015

I find myself constantly involved in continuous dialogues around the various approaches for enterprise application management in a cloud and DevOps era. One thing that I found particularly difficult in these discussions, is explaining the fundamental shift from the way we used to manage applications in a pre-DevOps/cloud world to the way we manage them today. And then I realized why.

A large part of the reason why this has been so hard, is that we still use the same terminology to describe the solution or features to overcome enterprise application management grief (logging, monitoring, orchestration…), that have historically been used to describe the management of these applications. All while these fundamentally differ from the way we actually implement or consume these features. Then I discovered how I could qualify and quantify, and even reconcile the gap between the traditional data center management model to modern DevOps frameworks.

In this post, I wanted to lay out these thoughts, as I'm sure that many others that are also in the thick of such discussions, especially in the enterprise world that is still heavily controlled by traditional IT groups, can leverage some of the lessons from this experience for their own discussions.

DevOps is Not a Feature!

Faced with this challenge, I believe what best explains where we have gone wrong is when we start thinking of DevOps as a feature.

Quite often you hear the IT guys point to their Tivoli, BMC, CA or other management solution that they use to manage their data center as the basis for a solution for DevOps - because it's a “standard” within the organization.

They do realize that there are special gaps and needs that they need to bridge to satisfy their DevOps team such as continuous deployment scenarios, better automation, and such. However, I’ve found that, initially, to bridge these gaps, the most common approach has been a gradual extension of traditional data center management tools with new features such as continuous deployment tools, automation tools and other “DevOps add-ons, features, and so-called facilitators”.

How is data center and application management different in the DevOps world?

To answer this question we first need to understand the shift that we're going through, and where we're heading.

I like to use the following slide to describe the shift that we're going through.

In a pre-DevOps world our entire data center was built under the assumption that each organization has its own special needs, and therefore requires a tailored approach and solution for almost every challenge. We used to be very proud of how special our data center is, and even kept the way it's run very secretive, and rarely talked about it publicly.

In a post-DevOps world, data centers are built as agile infrastructure that is optimized for the speed in which we can release new products and features, as well as the cost it takes us to achieve this. When we optimize for these goals, being special becomes a barrier, as it results in significantly higher costs and slower processes.

This is where the slide comes in. I find this analogous to the shift that happened in the car industry, when they moved from building custom cars to a production line like Ford or Toyota. That was a major shift not just in the way cars were designed, but also in the way the car organizations were structured. When we optimize for speed and cost we cannot afford silos, we cannot afford high-end devices that are optimized for the extreme scenario. Instead, we have to break silos and we have to use commodity resources.

This also leads to a huge culture change. We're now seeing all the "web scale" companies on stage speaking about their solutions, and even sharing them as open source projects. And they're doing this not because they see the things that they are doing as less valuable. Quite the contrary, they do so because they believe that by doing so, they can get better at a faster pace. Bigger, better, faster.

Ok, so how does this map to concrete features?

Even when I present the slide above, and all the heads in the room start to nod in agreement, it is still never enough.

People nod their head in agreement, but still continue with the approach of adding features to their existing management tools to bridge this gap.

Some of the vendors in this space even went a step further, and have rebranded their solutions, thinking that by calling them by a different name and adding a new bundle it would make these tools fit into this new DevOps world.

I, therefore, had to find a way in which I could also quantify this gap for the product managers in the room. To do that, I used the table below which maps the difference between management solutions in a pre and post DevOps world:

Pre-DevOps

Post-DevOps

Monolithic

Tool Chain

Closed Source

Open Source

Limited Scale (x100s) - rely on a centralized database

Web Scale - everything needs to scale out

Manage Hosts/Devices

Managing Infrastructure Systems and Clusters

Infrastructure Centric

Application Centric

Limited plug-ins

Future Proof

Monolithic vs. Tool Chain

In a pre-DevOps world, if you wanted to provide a management solution you had to develop your own logging, monitoring, billing, alerting and any other proprietary systems, simply because there was no other way to do it. This resulted in a fairly monolithic management solution.

In a post-DevOps world, we're looking for a best of breed approach where we select a tool chain that keeps on changing and growing fairly rapidly. Every DevOps group tends to select their own set of tools in this chain, which are for the most part from the open source community. They wouldn’t consider a solution that provides a suite of mostly closed source services all coming from the same provider, because by definition, that will both limit their ability to select and integrate new tools into their processes as they are being introduced, and it would also lead them to compromise on the quality of each service. This is because most of the monolithic solutions do a fairly average job with each layer (e.g. logging, monitoring...), as opposed to the individual projects that tend to be best in their domain.

Closed vs. Open Source

In the Devops world, open source has become a key criterion, where many of the traditional management solutions were built as closed source solutions. Contrary to what most people think, the popularity of open source isn't because its entry level is by definition free. Open source determines how well one can use or customize a given framework to their needs in areas where they see gaps. What’s more, it creates a community of users who develop skill sets around these tools, allows for more natural integration between tools, and many other aspects that at the end of the day have a direct impact on the ability to achieve higher productivity and speed of innovation.

Limited Scale vs. Web Scale

Most traditional management solutions were designed to handle tens or hundreds of services and applications at best. Quite often, they are built around a centralized management solution like MySQL, whereas in the web scale world we need to scale to 1000s, or even 100,000 nodes in a typical environment. To reach this level of scale, the architecture of the management framework needs to be designed to scale-out across the entire stack. This can be achieved by separating services such as provisioning, logging, load balancing, and real-time monitoring into independent services that can scale independently. It also needs to use other scalability best practices such as message brokering and asynchronous scale-out.

Manage Host/Devices vs. Managing Infrastructure Systems and Clusters

Likewise, this traditional tooling was designed to manage hosts and devices, whereas modern tooling should manage more sophisticated systems such as software containers and application-level monitoring. In this world, applications have mostly been built as a layer on top of these hosts. This basic assumption starts to break when we need to manage infrastructure systems and clusters.

If you think about what is required to manage a Hadoop or MongoDB cluster, for example, you’ll find that the process of installing and setting up those clusters requires a much more sophisticated process. Let’s take MongoDB orchestration as an example for this.

For the deployment and installation phase you’d need to start by creating the MongoDB cluster machines, and then setting up the network, which also usually requires:

Opening the client, replication and control ports, and then

Creating an isolated network for the MongoDB cluster data nodes

Next you’d need to create the relevant number of instances of the MongoDB master and slaves per host, populate the data into MongoDB, and finally publish the peer hosts (i.e. the MongoDB end points).

That’s just the beginning, most orchestration these days doesn’t end with the deployment phase, the post-deployment phase is equally as important. And this too, requires quite a bit of orchestration, including monitoring, workflows and policies, self-healing, auto-scaling, maintenance (e.g. snapshots/upgrades etc.), among other considerations.

Orchestrating such a cluster doesn’t just require setting up the infrastructure, i.e. compute, storage, networking, but also a process which can interact with the MongoDB or Hadoop cluster which will then relay the context of the environment, and even continue the interactive process by calling the Hadoop or MongoDB cluster manager. This delegation process is fairly complex and most traditional tooling was not designed to handle such complex processes.

In the DevOps world on the other hand, managing infrastructure systems and clusters such as database clusters (Mongo, Hadoop), an IP Multimedia Subsystem (IMS) is fairly common. Most of these systems and clusters come with their own orchestration and management. That makes the management challenge quite different, as we now need to allow more delegation of responsibility between the various layers of management, rather than assuming a single source of control for everything.

Infrastructure vs. Application Centric

Once upon a time, in a pre-DevOps world, most management tools were designed to manage compute, storage and network services. Application management was a layer on top, and quite often, was built in as an afterthought i.e. under the assumption that the application is not aware of the fact that it’s even being managed. Therefore, the focus has been on adding management and monitoring capabilities through complex discovery, or even code introspection.

The brave new DevOps world tends to be more application-centric, and the management tasks begin as an integrated part of the development process. In more advanced scenarios it is also common to use modeling languages such as TOSCA, and other similar languages to orchestrate not just the configuration and installation of applications, but to manage "Day-2" operations (i.e. post-deployment). We do so simply because that’s the only way in which we can achieve real automation, and handle complex tasks such as self-healing and auto-scaling.

Limited Plug-Ins vs. Future proofing

In a world in which the “only constant is change” we need to be able to continually introduce new frameworks and services, those that we know as well as those that we don’t yet know exist, but are probably under development as we speak. Traditional management solutions have been known to come with a concept of plug-ins, but quite often these plug-ins are fairly limited and complex to implement, and therefore needed specific support by the management solution owner.

To really be future proof, we need to be more open than that, and allow integration throughout all of the layers of the stack i.e. compute, network, infrastructure, monitoring, logging. On top of this, all that integration needs to happen without the need to modify and change the management solution. In addition, we also need to have runtime integration in which we can easily deploy applications that use a different set of cloud resources and infrastructure, or even different versions of the same underlying cloud.

All this needs to happen with complete isolation, and without the need to bring the management layer down every time that we want to introduce a new plug-in. For example, a development team could have a local OpenStack cloud running in its own data center with an application that can scale-out using OpenStack resources, but when the local OpenStack has insufficient capacity it can scale-out into the AWS cloud.

In addition to that, most of the traditional management servers come with a set of predefined and integrated plugins that are often times pre-integrated with the manager. This many times requires the introduction of new plugins, or the ability to run with different versions of the same plugin. Most traditional management tools do not support this kind of behavior, and are fairly static in this regard.

So, where are we heading?

Ok. By now I hope that I got you thinking that DevOps is far beyond yet another feature to tack on to an existing toolset. Having said that, it is still unclear where we are heading with all the disruption, is there an end in sight?

To answer this question, I will again allude to an analogy from the car industry.

The car industry is now moving from a point in which the production line was centered around the car manufacturing process, where once the car is released, it is managed mostly through manual processes We are now moving to an autonomous process that is continually monitored and equipped with a set of sensors that reports continuously on the state of the car and its surrounding environment. With this we are now able to continuously manage and control the car after it has been released from the manufacturing facility.

Similarly, we're heading to the point in which we're moving from a data center in which most of the processes were centered on getting things installed and deployed to a data center that will ultimately be completely self-managed.

In this data center most of our application management tasks that are performed manually today can become completely automated. This includes capacity management (through auto-scaling), continuous deployment, self-healing, etc. This isn't science fiction, there is already a growing list of organizations, starting with Google and Netflix, that run their data centers in this exact way.

May 20, 2015

The OpenStack Summits, which I have been attending twice a year for the past five years, continue to serve as the most formative and influential moments I have during the year - especially with everything related to the growth around OpenStack, which is especially notable from summit to summit. So right in the thick of OpenStack Vancouver, I’d like to take a moment and outline some of the important trends around where cloud will be going, and specifically around OpenStack in 2015.

OpenStack & the Move Away from Hypervisors

When I think about cloud and OpenStack trends for 2015, the primary trend of note to me is the move away from hypervisors to a fusion between containers and bare metal. I’m basing this on the hard numbers from last OpenStack User Survey that was released following the OpenStack Summit in November in Paris, that to me, only represents the beginning of a growing movement.

Part of the reason this migration is becoming a popular option is because with containers it’s simpler to run dynamic workloads on bare metal while still ensuring isolation between one workload and another. This suddenly makes the option of running containers on bare metal an attractive option that comes with a lot of performance and utilization benefits, not to mention simplicity, especially in specific cases where a full-blown cloud may be overkill. For example, running a Hadoop cluster or managing dev/test environments. So I believe this is only the start of such a trend - especially with the addition of Ironic in the latest OpenStack Kilo release, which will simplify this even further, this coupled with the addition of Sahara in Juno with added features in Kilo for I/O intensive workloads.

The Convergence of NFV and Enterprise IT

The adoption of OpenStack as the private cloud of choice in both enterprises and telcos brings the two industries much closer together. If previously telco IT and infrastructure looked quite different than that run by enterprises, with OpenStack, we’re seeing that the solution, IT as well as the backend stacks for running the actual core services of telcos becoming very similar to that of enterprises. As a matter of fact, at the last OpenStack summit, which is traditionally driven by enterprises, we’ve seen a significant growth in and influence of telcos who are adopting OpenStack. This can be noted in the significance of NFV in the OpenStack Juno release, and it's having received even more emphasis in the OpenStack Kilo release, including such features as port security for OpenVSwitch, VLAN transparency and MTU API extensions. In the last OpenStack & Beyond podcast on NFV & SDN with Axel Clauberg, of Deutsche Telekom, who is leading one of the most ambitious NFV projects on OpenStack, he discussed the need to optimize your IT environments through NFV, and simplify the complexity through automation.

Another very important element in this respect is TOSCA (Topology Orchestration Specification for Cloud Applications) that has been gaining adoption. Since telcos are a very standard driven industry, we’re seeing the importance and adoption of a standard like TOSCA becoming an important criterion in the choice of orchestration. The direction I see things going in this space in the upcoming year are that the formerly disparate IT worlds of NFV and enterprises will begin to converge, and TOSCA will play a leading role in becoming the de facto standard for NFV orchestration for both these industries.

Orchestration is the Next Big Thing

On that note, the recent entrance by both Google and Amazon to add support for orchestration as an independent service layer of the stack, marks the move of orchestration to center stage, in addition to Docker’s recent release of tools of Machine, Swarm and Compose for orchestrating containers , and will become an official important component in all private and public cloud offerings. Therefore, it's not surprising to see Canonical/Juju, as well as, a growing list of orchestration tools adopting TOSCA as their official templating language joining a list of existing contributors such as IBM, Huawei, Cloudify, FastConnect, Alcatel-Lucent, among others, not to mention the progress being made in the OpenStack heat translator project. My talk at the OpenStack Vancouver summit will focus on this growing diversity of orchestration tools - and will dive into the difference and synergies. You can also read a sneak preview on this session from the recent interview by Jason Baker here

The Future of Containers and Who’s in Control

It’s no news that containers are gaining huge momentum and popularity, and will continue to grow as this becomes a central piece in a modern cloud stack. I don’t think anyone put it better than Adrian Cockcroft (and I’m paraphrasing - as I heard him say this in one of his many talks) - “Docker is the technology that has gone the quickest from disruptive to legacy.” It’s almost easy to forget that Docker 1.0 was only launched in June last year.

Who is controlling the future of containers is now becoming a hot topic. As long as Docker was a small piece in the stack it was easy to swallow, but now as it's growing up the stack, it has started to compete with its own ecosystem. Therefore, it’s not surprising that we will start to see alternatives to Docker, one such example is the launch of Rocket earlier this year to keep up with Docker. Google just announced partnership with CoreOS and are, in essence, providing a full-stack alternative to the Docker-based approaches.

Personally i believe that for Docker to be successful it needs the industry support behind it and it wouldn’t be able to get this support if it wouldn’t open itself up through some sort of a foundation model that will give better chance for other companies to be part of the revolution and with that gain some level of control on their own destiny.

The OpenStack Distro War

Clearly the Mirantis $100M investment has put them in the spotlight as a rising star in the OpenStack distro world. As OpenStack is primarily positioned for private cloud and large enterprises, Red Hat has a promising position to continue and maintain its leadership position based on its current enterprise footprint, while Ubuntu, will have challenges gaining large marketshare within the enterprise world. And it also remains an open question whether the recent bold moves by HP who has finally gone all in on OpenStack is not too little too late. As mentioned in my previous post, on VMware/OpenStack, also joined the OpenStack distro war, with an aim to keep its existing customer base. This is potentially the main competition for Red Hat as they both compete in the same enterprise space, as Boris Renski noted in his post. I believe Mirantis will be a rising star in 2015, and expect to see more bold cat fights on the enterprise side by VMware/Red Hat and HP. The lack of a clear winner in this space will raise the demand for an independent/pure play app deployment and orchestration framework that can work with any kind of distribution.

Where OpenStack should be heading? - My personal thoughts..

My personal belief is that OpenStack will be much more successful if it will find a way in which more cloud providers will be able to add support for OpenStack, including those considered rivals to OpenStack such as AWS, Google, and Azure. Similarly, it will be more successful if it will find a way to encourage native support for OpenStack by other popular open source projects.

I think that so far there has been lots of focus on making OpenStack an alternative to Amazon. That strategy has led to spreading the OpenStack project too thin into many projects, in my opinion. I think that the VMware support for OpenStack is a good example for how a potentially rival infrastructure provider can become compatible with OpenStack. If we would make it easier for other infrastructure providers to add OpenStack compatibility to the OpenStack API we will gain much more than if we only focus on making OpenStack a viable competing alternative. Basically, I'd say inclusivity is of the essence, and not exclusivity: The open source way.

Final notes:

Overall, I think OpenStack will continue to demonstrate growth and be a serious business and investment driver across the cloud industry. It will continue to serve as a disruptive technology for many aspects that have previously been a black box in terms of “closed source” cloud. By opening the APIs to all - new and innovative forms of cloud automation capabilities are suddenly possible. I personally am looking forward to seeing how 2015 unfolds, and what the upcoming OpenStack summit in Tokyo, and the Liberty release, will hold in store.

For those who can’t make it out to Vancouver - we’ll be broadcasting live from the ground with an episode of OpenStack & Beyond, and you’re also welcome to join us for the OpenStack Israel event taking place June 15th in Tel Aviv, where you’ll be able to hear Axel Clauberg and Boris Renski firsthand.

February 04, 2015

Background

You don’t need to be an expert to realize that a failure of an eCommerce site during Black Friday or Cyber Monday is a disastrous event, leading to huge loss in revenue and reputation for the retailer. As the share of eCommerce accounts increases to more than 8% of total US retail sales this year, the impact of failure becomes more significant - not just to the site itself, but on the the overall economy. A study on the subject, compiled by Joyent and New Relic, showed that 86% of companies experienced one or more episodes of downtime last holiday season. At the same time, 58% of customers will not use a company’s site again after experiencing site errors.

Another study by Radware measured not just the impact of downtime on eCommerce sites, but also the impact of slowness - an even more common and less measured metric. According to this study a one-second delay correlates to:

2.1% decrease in cart size

3.5-7% decrease in conversions

9-11% decrease in page views

8% decrease in bounce rate

16% decrease in customer satisfaction

a 2.2 slowdown equals a 7.7% conversion rate hit.

Meanwhile, KISSmetrics illustrated how page loads longer than three seconds lead to a 40% bounce rate.

Obviously there is enough business incentive here to invest in handling both the downtime and latency issues. Meanwhile taking a look at a typical retailer traffic (source: Akamai) during this season, we notice that the traffic spikes increase by at least 500%:

In this post I will share our specific experience and lessons-learned from the 2014 holiday season which turned out to be very successful. I believe that the results below speak for themselves.

2014 Results

How We Achieved These Results

Taking a preemptive approach - rather than reacting after a failure occurred- prevented failure in the first place.

Common Causes: Most failures are the result of misconfiguration or capacity planning guesswork.

Knowledge & Experience: eCommerce applications are complex and built from many subsystems. In many cases, an eCommerce organization does not have the expert skill-set in each of the subsystems. Having an expert in the room helps to bridge this gap and builds the capabilities of business operations.

Fast Feedback: When product-related issues are identified, we were able to provide the fastest path to protect the business and address concerns in a timely fashion.

To give you a bit more insight on this process I’ve added a section to this post called Stories from the War Room which illustrates a real-life incident and the action that was taken by our on-site engineer to resolve it.

Data is mirrored back into the database in batches. In this way, peak load transactions are buffered so that database traffic does not crash the database back-end.

The In-Memory Compute grid acts as a system of record. Failure in the underlying database can be saved without affecting the online users while the database is restored to a working state.

Using a combination of In-Memory & SSD allows very large In-Memory data sets to be stored at a reasonable cost, while still ensuring fast recovery during failure.

Self-Healing Systems recover from failure in real time

Failures are inevitable: Keeping a backup copy in-memory enables zero-downtime systems to service user traffic without interruption, even if something does go wrong.

Systems provisioned for failure handle failure by design.

Automatic failover and provisioning eliminates the need to overprovision (costly) resources in case of failure. Traditionally, it’s common for retailers to provision resources for holiday season that are 5 times the capacity of non-holiday traffic infrastructure.

Two Examples

In one case, a Top 100 online retailer used XAP to provide access to its catalog and inventory data and achieved its first zero-downtime holiday season in several years. As a result, this retailer delivered a vastly improved customer experience from previous years (achieving an 18% improvement in customer satisfaction ratings) and generated a 139% increase over 2013 holiday sales.

In another case, a Top 30 US online retailer logged a record-setting peak sales day of $44 million. This was especially notable because that same day the retailer experienced system performance issues caused by an automated hardware failover condition. Fortunately, the retailer’s XAP implementation began automatically relocating application components to standby resources, keeping apps running despite the complications. As a result, consumers continued to shop—and buy—with minimal disruption.

Stories from the War Room

I’ve picked two issues that we identified as our engineers were working on-site with one of our top eCommerce customers. I thought that these two cases provides useful insight on how a preemptive support strategy and a short feedback loop works:

Issue #1: Sudden slow client response time

The quote below was taken from the direct on-site report:

GC Spikes is one of the common issues that we encounter for managing in-memory data clusters. As GC tends to compete with the same CPU resources that serves the user transaction, this often leads to overall slowness of the system. Fairly quickly, this slowness can pile up into a huge backlog which can break the system in unexpected areas.

The resolution was to split the cluster into more data containers (GSC in XAP terminology) as this will allow better spread of the load across the entire cluster. In addition, overall capacity (memory and CPU) that was allocated for the cluster was increased to meet the increasing capacity demand.

The diagram below provides a view of one of the clusters at the time the issue occurred.

As can be seen, around 23:00 the system started to hit its high CPU mark as a result of GC spikes. The system was gradually rebalanced after couple of hours without facing any downtime. The preemptive action that was taken to handle this incident prevented it from piling up and causing a complete system failure.

Issue #2: Connection Issues

In this incident, the increase of concurrent client activities during peak load resulted in a large number of network connections that were opened at the same time. One of the nodes in the cluster was misconfigured with a low limit on the number of concurrent connections that could be opened simultaneously. The resolution was to kill that faulty node and leverage the self-healing capability of XAP to force an immediate re-route of clients into the backup node while relocating the faulty node into another machine.

Final Notes

Peak load performance often tends to stretch any system behavior in areas that are least expected and thus are often hard to handle. Quite often, peak loads lead to unexpected downtime.

There are many cases in which this sort of peak load performance is known in advance, as is the case with Black Friday and Cyber Monday. Still, many eCommerce sites continue to experience downtime or slowness during such events that lead to huge loss of revenue and reputation.

As a software vendor, we have often found ourselves involved in the early architecture discussion phases, which usually take place as a result of failure in the previous year. Despite the fact that we are brought in to solve these peak load performance problems, we are still called during a fire drill when those failures occur. Often, the result of the failure was misconfiguration or a problem on another system that manifested itself as an issue in our product. Obviously the experience of handling fire drills is never be pleasant, neither for us nor for our customer and that is something that we wanted to avoid as we approached the 2014 holiday season.

This year we decided, together with our customers, to take a more preemptive approach by putting an engineer on-site to escort the customer team during the event itself. This resulted in huge success, leading to 100% up time. Both teams learned much from the experience; the customer learned even better how to operate our product and what to look for to ensure that the system is running properly. We learned much about how the customer is using our product and were able to shorten the feedback loop between the customer and our product and engineering team.

With those lessons in hand I feel that both we and our customers are much more equipped to handle 2015. I can’t wait to write about the lessons learned from Black Friday 2015.

February 03, 2015

While hybrid cloud has been a mainstay discussion in the cloud world for quite some time now (note this post of mine on hybrid cloud from 2011, when CloudStack was still Cloud.com), the reality is that setting up hybrid cloud has proven a fairly complex process. Therefore we've seen only a few real implementations of true hybrid cloud in the wild.

In this post, I wanted to describe the joint work that we've invested together with the VMware cloud team to change this reality and finally make the hybrid cloud story simple and possible.

vCloud Air makes hybrid cloud truly simple and real

While most public clouds started as independent public cloud services, and only later started to add private cloud connectivity; the exact opposite is true for vCloud Air, which was designed primarily as an native extension of VMware private cloud. In this context, users are able to use the same tools currently in use to manage their local VMware environment in order to manage their public cloud resources, as well. This makes the hybrid cloud story significantly simpler, as there is no longer the need to worry about the connectivity between the two sides of the data center, and at the same time we get the cost benefit of using resources on-demand on a pay per use basis.

The ability to make hybrid cloud so simple fit well with the Cloudify pure-play orchestration vision, and made the integration with the two platforms a perfect fit.

In this post I wanted to spend some time on what that means for VMware and OpenStack users.

Putting vCloud Air, TOSCA and Cloudify together

vCloud Air users needed a simple way to deploy and manage their applications on vCloud Air using DevOps tools similar to Amazon Cloud Formation.

As the landscape is still being shaped, and there’s no clear winner yet, it is very important for these users to have tools that will allow them to keep their options open.

TOSCA seemed to be a great fit in such a context, as it provides a standard templating language that doesn’t tie users into a particular platform.

On top of this, TOSCA orchestration also aims to go beyond the installation part of the application and covers all the aspects of the application lifecycle through the addition of workflows and policies that can be used to handle continuous deployment, self-healing, and auto-scaling processes.

On a similar note, many Cloudify users comes from Enterprise and Telco markets. Such users are already heavily invested in VMware. These customers are actively looking for a more agile and cost effective way to run their data center. The ability to bring capacity on demand seamlessly into private cloud makes the entire private/hybrid cloud story truly attainable.

By putting vCloud Air, TOSCA and Cloudify together we’re getting the cost benefit of using on demand resources for our private data center in a way that doesn’t tie us to a specific platform and at the same time provide a more complete application lifecycle and management solution that enables full automation of our deployment and DevOps processes.

Extending the solution to the rest of the VMware stack

VMware provides three kinds of platforms today, the vCenter/vSphere environment that serves many of the enterprises data centers today, and VMWare Integrated OpenStack, which provides API compatibility with OpenStack for vSphere/vCenter environments.

Cloudify comes with native TOSCA support through plugins that allows you to use TOSCA blueprints as a templating language across both vCloud Air, vSphere and OpenStack that provides a consistent way to manage applications across all three environments. On top of this, it also makes it simple to build mixed resources of all three environments all with the same blueprint.

Having pure-play orchestration and management that is backed by an industry standard, TOSCA, enables the decoupling of the way to manage applications from the underlying infrastructure. While there is still the need to rely on specific APIs and features of the specific infrastructure, they are now at a point where the cost of the switch isn’t as significant compared with the alternative of binding themselves directly into the specific underlying infrastructure.

The benefit for OpenStack users

Many OpenStack users are Enterprises and Telco customers who are known to be heavy users of VMware platforms. This integration makes the ability to mix and match existing VMware environments together with OpenStack and vCloud Air, providing greater flexibility to decide which applications and workloads fit best in which environment while using common and consistent management for their applications across environments.

What should we expect next?

Through our partnership with VMware we’re planning to make this integration simpler for vCloud Air users by allowing them to use TOSCA directly through the vCloud Air public cloud service. In addition, the plan is to make the hybrid cloud story between vCloud Air, vSphere and OpenStack as seamless and simple as possible so we will be adding more examples and tighter integration for all three platforms.

December 16, 2014

Docker started as just a software container on top of a Linux operating system which seemed like a simple optimization for a fat hypervisor.

Its disruptive force however comes from the fact that it does force us to rethink many of the layers of the cloud stack. Starting from the way we handle configuration management, through the way we handle networking and build systems, and even microservices. Not all of this is directly related to Docker per se, but that’s the difference between thinking of Docker as a container, and Docker as a change agent or a movement.

Aside from Docker’s traditional analogy to the shipping world and how containerization changed the landscape of the maritime world, to me a similar analogy to this is moving from bricks and mortar to glass and metal buildings. You could think of glass and metal as just another form of constructing the same buildings and houses we’ve always had, however the introduction of glass and metal design changed the entire landscape and standard of our former city lives. The fact that with this new approach we can now build entire buildings in a fraction of the time, and rising to a height that is exponentially higher than with the former construction methods, is much more than optimization; it has led to a complete disruption and renewed way of thinking about architecture and design principles.

In this way, we can choose to think of Docker as a simple optimization for building PaaS - instead of just using buildpack for shipping our software into a PaaS – since we can now use a Docker images instead right?

The problem with that thinking lies in its root - thinking of Docker and containers as yet another software packaging tool is analogous to thinking that glass and metal are just another form of bricks.

A basic summary of what was discussed is whether Docker can be considered a platform without having supporting services, whether orchestration is the missing link, whether Docker is a viable VM replacement, and if PaaS is just a buzzword or actually constitutes anything on top of IaaS with orchestration implementations on various layers, and what really is the difference between abstraction and automation?

So, why is this *such* a heated debate?

To understand why is this such a heated debate we need to understand the various players in this discussion, as their perspective is very much influenced by whether you’re a cloud provider, a PaaS provider, an orchestration provider or a container provider.

Mapping the different players and their approach to this trend.

The cloud providers perspective:

The most interesting perspective IMO is that of the cloud providers. Cloud providers like Amazon and Google already provide a PaaS: Elastic Beanstalk and GAE, yet in their recent announcements they announced a new offering for orchestration and containers as a service that is not tied to their PaaS offering. Judging by the market reaction it looks like many of the users have been quite enthusiastic and in favor of this new offering.

What can we learn from this?

Cloud providers looks at PaaS as yet another tool to drive workloads into their cloud. In the case of AWS they don’t even charge extra for their PaaS other than the cost of the infrastructure instances that they use. Quoting from the AWS pricing page:

“There is no additional charge for Elastic Beanstalk – you only pay for the underlying AWS resources (e.g. Amazon EC2, Amazon S3) that your application consumes.”

That puts them in a much more pragmatic and unbiased position to basically offer containerization as part of their PaaS offering, or as a new service as long as it meets their users demand and thus drives more utilization onto their infrastructure.

The fact that they decided to offer orchestration as an independent service and not an extension to their PaaS offering is IMO the strongest validation that this is probably the approach that best meets the user’s needs.

PaaS providers

PaaS providers on the other hand are making a good amount of money selling PaaS platforms and services. It is, therefore, clear that when they approach this question that they are biased, by definition. They view Docker as a threat and as a result are trying to minimize its real value, and position it as a natural evolution of their existing platform. Their strategy is to declare support for Docker as an underlying container. They would also offer the option to use containers as the packaging format for applications similar to buildpack. OpenShift from Red Hat have taken even another step in this direction and are planning to switch their underlying orchestration engine to Kubernetes.

To me, all this is a fine progression but the main question that still remains open is whether PaaS is indeed the right tool for handling more complex application workloads?

This opened up another interesting question - is there enough value left for PaaS if we can use containers with orchestration (the most obvious being Docker orchestration) as an automation and management tool?

That question sparked an interesting debate which I found to be fairly surprising as it seemed to reflect what I view as a bit of a narrow PaaS-centric view by many of the PaaS providers who fail to realize that there is more than one approach to managing applications rather than just putting a container abstraction on top of my apps.

What is the difference between PaaS (abstraction) and orchestration (automation)?

Both PaaS and orchestration aim to solve the complexity challenge of deploying and managing apps. Having said that there there is a fundamental difference between the two approaches, let me explain.

PaaS:

Takes an abstraction approach, with abstraction we’re basically hiding complexity by exposing a simpler interface. That approach also comes with an opinionated architecture i.e. in order to provide a simple interface, applications need to be built and written in a certain way that fits the assumptions behind the design of that platform.

In the case of PaaS you don’t have much control over many of the operational aspects associated with managing your application, for example the way it handles scaling, high availability, performance, monitoring, logging, updates. There is also a much stronger dependency on the platform provider in the choice of language and stack. Of course, some of these are open source and provide a range of plug-ins that allow some degree of extensibility, but at the end of the day you still have to make sure that this fits with the core design of the platform.

The main advantage with this approach is that as long as the app fits into the PaaS design principles, you do get a simple way to deploy applications without worrying about the operational aspects. It’s also much simpler to guarantee the behavior of your applications once they have been deployed.

Automation/Orchestration:

With automation we’re basically taking the same steps that we would have performed manually, and scripting them. By scripting them, we’re achieving a similar outcome, i.e. we can run a complex processes such as application deployment in one command, however the fact that the end result may be similar doesn’t make the two approaches the same as is often argued, let me explain. Kind of like the end doesn’t justify the means, but more to the effect of - the end doesn’t necessarily account for the means.

With automation we run a script, and as such we can actually read the script, and quite often understand the underlying steps that will be executed when we run it. As those steps will often follow the same steps that we would do ourselves, it’s also easier to follow up on these steps and retrace them for troubleshooting purposes. All this is fine, but that’s not the main difference. A script is something that can be shared, cloned, modified or rewritten completely so the degree of control and flexibility is significantly higher than with that of a PaaS/abstraction approach.

That flexibility also comes with a cost. With automation / scripting it’s much harder to guarantee portability and the behavior of an application, as it often relies on many external dependencies that can break at any given point in time. So in the end, we may still end up with too much complexity. (Unfortunately, the usual tradeoff for flexibility is complexity when it comes to technology).

So to put it in @nukemberg’s terms : The difference between PaaS and Automation/Orchestration can be summed up as Magic vs Black Magic.

Adding containers to the rescue

This is where the combination of containers and automation becomes handy. Containers allow us to strike a better balance when it comes to the degree of control vs. simplicity. Containers allow us to reduce the complexity, which is a result of the number of moving parts and dependencies. The right balance in this way is to use automation mostly to handle the dependencies between services and tiers of applications, handle policies such as scaling and failover, and less so how to install software and configure it on RHL, Ubuntu, Windows or whatever environment.

Putting PaaS, orchestration/automation and containers together

Both PaaS and orchestration/automation and containers shouldn’t be viewed as alternative to one another but rather as complementary stacks, in a very similar way to the way Google and Amazon have approached the same challenge. The diagram below is taken from my previous post on Docker vs. PaaS that outlines how each of the layers are ordered in this new stack.

It is important to note that the PaaS box in the diagram refers to the more traditional PaaS implementations i.e. Elastic Beanstalk, Heroku, GAE and such. Both Pivotal/CloudFoundry and Red Hat/OpenShift are building new PaaS versions that will expand into orchestration/containers to support more advanced orchestration and container support.

The diagram above illustrates how I believe the new application deployment platform stack would shape up when we add orchestration/automation and containers to the mix. This layered approach enables users to choose which layer of the stack they want to use based on their specific use case. E.g. they can choose a PaaS for use cases where they just want to deploy simple apps and not worry about how their apps are managed, or use automation/orchestration if they want to have more tight control over the operational aspects of their apps. Whether this new stack will be packaged under the same platform is less important for the sake of this discussion.

Why is this still disruptive?

So if we can put PaaS, automation/orchestration and containers together why is this still disruptive?

I think that the answer to that is based on the fact that with the combination of containers, this now allows us to remove a fairly big chunk of the complexity for automating our application deployments. By doing this, the difference in terms of complexity between the use of PaaS/abstraction or the automation/orchestration approach to deploy even a simple application has narrowed significantly.

I, therefore, think that given those two options, most users would prefer to use an approach that is simple enough but does leave them with a higher degree of control. Because of all this, I expect that we will see a much wider and broader adoption of containers/orchestration to manage apps, rather than PaaS.

November 16, 2014

In my recent post on OpenStack “Do I Need OpenStack if I Use Docker?” I covered the confusion that has emerged around one area of disruption that Docker has created. In this post, I’d like to segue into another similar topic that has emerged around the same concept where I sense similar disruption and confusion: the use of PaaS in its traditional form vs using a straight Docker approach.

Background

PaaS in its simplest form is a platform that enables developers to develop and deploy web applications into production fast by abstracting many details of the underlying infrastructure, whereas Docker provides a generic container for packaging software stacks in a portable way.

A Traditional PaaS Stack

Many of the traditional PaaS implementations use Docker or LXC as the underlying container to enable multi-tenant applications on a shared infrastructure. Usually PaaS relies on web containers that are specific to a certain language (Tomcat, Jetty/Java, modephp/PHP, Rails/Ruby, Django/Python,NodeJS/JavaScript…..) to host the user’s code, and Docker or LXC containers as lightweight VMs that hold these web containers, and provides the required isolation level between the various web containers. This architecture lends itself nicely into simple web applications which are often built out of a web front-end, and a database as a backend persistent model. As far as I know, Docker actually initially emerged from one of the PaaS’ provided by DotCloud, and evolved into its current incarnation.

The introduction of Docker as a simple abstraction to containers, has turned Docker into a portable software container that isn’t tied to a specific language or framework. It can not only be used as a web container, but also as a more general purpose software container for all sorts of applications such as microservices, big data analytics, stream processing, as well as legacy applications.

Some PaaS implementations, such as CloudFoundry, have already started to offer support for running Docker containers as one of the software components that are supported by the platform.

Having said that, PaaS provides a fairly opinionated model for the way we should model and package our applications. Often times, our packaging is too limiting as we start to grow to more scalable models, such as microservices.

This causes many users today to question whether there is enough added value in PaaS in order to justify these limitations, when users can simply run their software directly on Docker.

Using Docker as an Alternative to PaaS

Using Docker with a combination of orchestrations engines such as Kubernetes (in the case of GCE), Heat in the case of OpenStack, or Cloudify (for multiple clouds such as OpenStack, VMware, AWS, as well as bare metal)

can provide a relatively simple way to package, deploy and manage applications in an automated fashion.

Unlike the traditional PaaS stack - it doesn’t force a specific opinionated architecture for deploying apps although it does lend itself very nicely into both web applications and microservices architecture.

My Take

The main promise of PaaS is in the speed of development and simplicity. Traditional PaaS implementations achieve that goal through extreme abstraction of the underlying infrastructure, as well as by promoting an opinionated architecture, which provides a fairly optimized and consistent deployment experience for applications that fit into that model. The combination of the two ultimately does help achieve this promise, but for a fairly limited set of applications.

Using Docker coupled with orchestration as an alternative provides a more general purpose platform for achieving a similar goal. Unlike the traditional PaaS approach, this use of Docker with an orchestration engine does expose the user to more knobs and infrastructure details, which makes the user experience for developing and deploying simple web apps more complex than with the PaaS alternative, however it does provide the added benefit of flexibility - like with the onset of OpenStack, and open source cloud in general..

The right way to think of PaaS and Docker

In my opinion, the right way to think of the two is in a layered approach in which both the PaaS containers and Docker fit together as outlined in the diagram below.

As seen in the diagram above, PaaS becomes one use case for deploying simple web apps. For more complex application deployments, it makes more sense to use Docker directly. In this context orchestration becomes a central piece for managing both use cases. Orchestration also provides a layer of abstraction between the application components and the underlying infrastructure.

The important realization is that in this context the role of PaaS as we know it becomes more of a niche role, yet still remains a fairly common and important use case. It is also important to note that the two models are not mutually exclusive. An application that runs on a PaaS container could interact with services that are managed directly through a Docker container, just as you would use other cloud services such as database as a service, load balancer as a service or any other on-demand resource.

Another important note is that the combination of Docker and orchestration opens a new path to manage dynamic workloads not just on an IaaS environment, but also directly on bare metal. I’m starting to see more use cases where this option becomes more popular, specifically for managing dev and testing environments where the use of a full blown cloud infrastructure may just be overkill.

Final Note

When we started the journey into the cloud world a few years ago, the underlying cloud infrastructure was very complicated. As a result, the only way possible to simplify the application deployment experience was to take extreme measures in the level of abstraction, as well as by forcing an opinionated architecture to ensure that the application fits into typical cloud architecture. As the underlying infrastructure and tooling evolves, those extreme measures are no longer needed for all use cases. What we’re now seeing is basically a shift of of pieces in the PaaS architecture that was previously embedded within the "PaaS Box" such as containers and orchestration that have becomes more generic, along with independent services that have their own right to exist.

The good news in all this, is that as users we are finally beginning to receive the simplicity of PaaS, that was previously limited only to greenfield web applications, for use with a much broader spectrum of applications and architecture styles.

So to answer the question from the opening note, do i need PaaS if i use Docker? The answer is, yes. It makes sense to have the two combined, but the role of PaaS as we know it has narrowed down significantly.

November 06, 2014

OpenStack is an open source cloud infrastructure that is considered by many as a cost-effective alternative to VMware.

In reality, the transition from VMware to OpenStack isn’t that trivial, which leads most enterprises to take a hybrid cloud approach. This means they run their OpenStack infrastructure alongside their existing VMware infrastructure with an aim to gradually transition workloads into their new OpenStack environment.

Recently, VMware announced their own VMware Integrated OpenStack, which is a VMware-supported OpenStack distribution providing tighter integration between existing VMware environments and OpenStack.

In this post I will discuss three of the options for putting OpenStack and VMware together and weigh what I believe are the pros and cons of each approach.

1. Using OpenStack with VMware Hypervisor plug-in

OpenStack Nova comes with a pluggable architecture for integrating various hypervisors. It supports both KVM, VMware as well as Hyper-V.

The first option for integrating VMware into OpenStack is through the ESXDriver for Nova.

In this option the Nova scheduler can spawn VMware ESX VM’s through an ESX enabled node (as opposed to KVM which runs directly from the Nova compute node).

Pros:

In this approach, we can potentially re-use our VMware images assets and easily import them into our OpenStack environment.

Cons:

1. Limited use of VMware features: ESXDriver cannot use many of the vSphere platform advanced capabilities, namely vMotion, high availability, and Dynamic Resource Scheduler (DRS).

2. Limited Portability: There are some features of VMware, such as vMotion, that many enterprises rely on today and would need to be turned off as we move to OpenStack. As some of the images were built with the assumption that this unique VMware features exists, it wouldn’t be possible to transition them into an OpenStack environment and expect them to work.

In summary, while this option makes sense, the cost and technical limitations make it less popular. In fact, a recent OpenStack survey report indicates a fairly small percentage of users who actually use this feature.

Using VMware vSphere with OpenStack

In this approach we utilize the compute resources using vSphere. This will allow us to take advantage of all the VMware features that come with vSphere and overcome the limitations mentioned above.

In this case, the entire vCenter ESX cluster appears as a one big hypervisor. The actual allocation of the ESX hosts is done through VMware vCenter and is not exposed to the Nova Controller, as outlined in the diagram below:

Similarly, we will plug-in the VMware Storage and Network services to allow for even more complete integration across the compute, network and storage stack as outlined in the diagram below:

The main advantage of this approach is that it enables users of VMware to benefit from both worlds; on one hand, they can use OpenStack as an open API and on the other hand they can utilize their existing VMware infrastructure.

Cons:

The main disadvantage of this approach is that it creates a completely different OpenStack implementation, which has some serious differences in its implementation and behavior from the original open source version of OpenStack - specifically in the way the compute nodes are managed.

The Elephant in the Room:

One of the main motivations to transition to OpenStack in the first place is to cut costs.

In both options we rely on the VMware stack, and therefore the actual savings are still unknown.

3. Using a Common Management and Orchestration as an abstraction to both VMware and OpenStack

In the third option we will not use any of the OpenStack VMware plug-ins, but instead we will use an orchestration layer as higher level abstraction between OpenStack and our VMware environment.

The orchestration layer provides a common management and deployment infrastructure. In this approach we will not be trying to force the VMware infrastructure to fit into the OpenStack API, but instead we will just map the different calls to either OpenStack or VMware into the appropriate type. In this way, the application is kept aware of whether it is running on OpenStack or VMware. However, since the calls to each of the infrastructure components are now centralized into one driver per environment, it is managed once for all the applications. Additionally, there is a default implementation for the built-in types, so in most cases the user will need to deal with the details of implementations of each element type only for specific customization.

Pros:

We are able to utilize all of the features and capabilities of each infrastructure with no limits.

We reduce dependency risk - With this abstraction we’re less vulnerable to a specific infrastructure and we keep our options open to move or add new environments as needed.

Support for vSphere and vCloud -Since we are not limited to the use of a specific API, we can integrate with both vSphere and vCloud API.

Cons

We are shifting the dependency on the management and orchestration layer.

We may lose some of the management and orchestration capabilities that are specific to each of the environments.

Additional customization effort: Since we’re not relying on a common API, we may need to customize the built-in types per environment to fit our specific needs. Having said that, it is important to note that this customization is only done once for all of our applications. Additionally, it is expected that over time the default types will cover most of the use cases. Therefore, the need for customization will be minimized.

My Take

Given that the cloud infrastructure world keeps on changing and evolving very rapidly (not just between OpenStack and VMware), any tight integration approach will have a higher chance of breaking compatibility or being limited to the least common denominator at some point. We also need to be aware that even though in the first two options we remain compatible with the OpenStack API, we still end up with different OpenStack implementations from a behavior perspective.

On top of that, we need to be prepared for new disruptions. This is actually taking place right now, for example, with Docker orchestration that continues to disrupt and challenge the way we handle our compute and network infrastructure.

With this in mind it would be too risky not to keep our options open.

Having an abstraction layer at a higher level of the stack gives us the benefit of being less vulnerable to changes at the lower level infrastructure. It also provides us the flexibility to adopt new infrastructure changes in the future.

Having said all this, the question “Are we minimizing or just shifting the risk?” still lingers.

This is where TOSCA comes in handy. TOSCA provides a standard way to describe our application blueprint. This significantly reduces our dependency on a specific implementation of the orchestration and management layer and this was one of the main reasons that led us to choose TOSCA when we designed the third generation of our orchestration with Cloudify.

November 04, 2014

Docker has broken a record in the speed in which it moved from being a disruptive technology to a commodity. The speed of adoption and popularity of Docker brings with it lots of confusion.

In this post I wanted to focus on a trend of commentary that has been gaining popularity, that I’ve started to hear more often recently from users who just started using Docker. Whether it makes sense to use OpenStack if they’ve already chosen to use Docker.

Before I give my take on this, I wanted to start with a short background for the rationale behind this question.

Background

In its simplest form, Docker provides a container for managing software workloads on shared infrastructure, all while keeping them isolated from one another. Virtual machines such as KVM do a similar job by creating a complete operating system stack of all the OS devices (through a Hypervisor). However, unlike the virtual machine approach, Docker relies on a built-in feature of the Linux operating system named LXC (Linux Containers). LXC utilizes the built-in operating system features of process isolation for memory, and to a lesser degree, CPU and networking resources. Docker images do not require a complete boot of a new operating system, and as a result, provide a much lighter alternative for packaging and running applications on shared compute resources. In addition, it allows direct access to the device drivers which makes I/O operations faster than with a hypervisor approach. The latter makes it possible to use Docker directly on bare metal which, often times causes people to ask whether the use of a cloud such as OpenStack is really necessary if they’re already using Docker.

This performance difference between Docker and a hypervisor such as KVM is backed by a recent benchmark done by Boden Russell and presented during the recent DockerCon event.

The benchmark is fairly detailed, and as expected, it shows a significant difference between the time it takes to boot a KVM hypervisor to a Docker container. It also indicates a fairly big difference in the memory and CPU utilization between the two, as can be seen in the diagram below.

This difference in performance can map to a difference in density and overall utilization difference between the two in similar proportion. This difference can easily map into a big difference in cost that is directly affected by the number of resources needed to run a given workload.

My Take:

This question has nothing specific to do with OpenStack and can be applied similarly to any other cloud infrastructure. The reason it is often brought up in the context of OpenStack, in my opinion, is due to the fact that OpenStack is fairly popular in private cloud environments which is the only environment in which we can even consider a pure Docker alternative.

It's about the hypervisor stupid:

Many of the performance benchmarks compare Docker versus KVM, and have little to do with OpenStack. In fact, this specific benchmark of both KVM images and Docker containers was running through OpenStack which shows that the two technologies works nicely together. In that context, most of the utilization arguments become irrelevant when I choose to run OpenStack on top of a Docker based Nova stack as illustrated in the diagram below taken from the OpenStack documentation.

Cloud infrastructure provides a complete data center management solution in which containers or hypervisors, for that matter, are only part of a much bigger system. Cloud infrastructure such as OpenStack includes multi-tenant security and isolation, management and monitoring, storage and networking and more. All of those services are needed for any cloud / data center management and have little dependency on whether Docker or KVM are being used.

Docker isn't (yet) a full-featured VM and has some serious limitations like security, Windows support (as indicated in the following email thread in the Red Hat mailing list), and therefore cannot be considered a complete alternative to KVM just yet. While there’s ongoing work to bridge those gaps, it is safe to assume that adding the missing functionality may come at an additional performance cost.

There’s a big difference between raw hypervisor performance/containerization and application performance, as indicated in the following graphs from the benchmark results. A possible explanation for that is that applications often use the caching technique to reduce the I/O overhead.

If we package Docker containers within a KVM image the difference can become negligible. This architecture often uses hypervisors for managing the cloud compute resources, and an orchestration layer such as Heat, Cloudify, or Kubernetes on top to manage containers within the hypervisor resources.

Conclusion

This brings me to the conclusion that the right way to look at OpenStack, KVM and Docker, is as a complementary stack in which OpenStack plays the role of the overall data center management. KVM as the multi-tenant compute resource management, and Docker containers as the application deployment package.

In this context a common model would be to use Docker for the following roles:

Having said all of the above, I do see cases mostly for well defined workloads, where the use of cloud infrastructure isn't mandatory. For example, if I were to consider the automation of a small shop development and testing environment for DevOps purposes, I would consider using Docker directly on a bare metal environment.

Orchestration can be a great abstraction tool between the two environments

One of the benefits of using an orchestration framework with Docker, is that it can allow us to switch between OpenStack or bare metal environments at any given point in time. In this way, we can choose either options just by pointing our orchestration engine to the target environment of choice. OpenStack Orchestration (Heat) declared support for orchestrating Docker starting from the Icehouse release. Cloudify is an open source TOSCA based orchestration that works on OpenStack and other clouds such VMware, AWS and bare metal, and recently included Docker orchestration . Google Kubernetes is associated mostly with GCE but could be customized to work with other cloud or environments.

October 13, 2014

In the previous post I outlined what I mean by native-approach to the OpenStack cloud. In a nutshell, a native approach represents out of the box tighter integration with OpenStack, all without being limited only to OpenStack.

In this post I want to share some of the stories behind the design of Cloudify 3.0 and the approach that we have taken to make Cloudify *native* to OpenStack. I would first start by outlining our motivation behind this approach.

OpenStack-only or Not-only-OpenStack?

It is our core belief that for the foreseeable future supporting hybrid cloud is going to be important for many of the OpenStack users for the following reasons:

Transitioning to OpenStack is a long journey - Many enterprises have VMware-based stacks as their core infrastructure. Where many of these organizations have often started their journey toward OpenStack by creating a dev environment, and then gradually moving more workloads into their OpenStack environment as they get more comfortable with it. This process usually spans over months and possibly even years. It became apparent to us, that during this time these customers would still be using VSphere and VCloud in parallel to OpenStack. For such users, having a common platform to manage their app deployment across two environments can smoothen the transition and also reduce the risk of getting stacked if one of the environments doesn’t meet their needs.

The public cloud market is going to be dominated by non-OpenStack clouds - while OpenStack is on a trajectory to dominate the private cloud market, the public cloud market looks quite different with AWS as a strong and long-standing leader and other major clouds emerging such as GCE and Azure - hoping to close in. Many organizations wouldn’t want to host their entire data center on a private cloud, and would therefore need to take a hybrid cloud approach. In order to support hybrid cloud, we had to include an abstraction layer that will allow us to integrate with clouds other than OpenStack.

At the same time we wanted to have a more intimate integration with OpenStack in order to leverage the fact that it is open source and allows for a deeper integration with its core services. Through that we were able to enable smooth and simple integration of Cloudify into OpenStack.

Extending or Rewriting?

Cloudify 2.x already took a step toward deeper integration with OpenStack and we were one of the first to announce our support for OpenStack. Recently, we also added support for OpenStack Neutron.

We realized that if we want to make Cloudify fit natively into the OpenStack infrastructure it is not enough to have integration with the OpenStack components. We also need to ensure that the design behind Cloudify will be consistent with the way other projects and services of OpenStack are implemented. This led us to the conclusion that we wouldn’t be able to be a first class citizen with OpenStack unless we underwent a major redesign of Cloudify.

Going Python?

Our first step in this direction was to move from Java to Python. Moving to Python wasn’t as painful as you would think for us, because many of our developers already knew the language. This decision quickly proved itself, as it became much simpler not just to develop the new version of Cloudify faster, but it also enabled us to join other OpenStack projects as well as reuse some of the same components and frameworks that are already used by OpenStack.

Integrating with Core OpenStack Services

Many of the products that declare support for OpenStack mostly integrate with the compute API (Nova), and quite often come with their own security, messaging, configuration, and other such considerations. We felt that this is not enough, and doesn’t fit in with the way other services are built within OpenStack. Hence, we decided to deepen our integration with other services such as Keystone, Neutron, Heat and later with Ceilometer, Mistral and others.

Going TOSCA?

Cloudify 2.x was designed around a proprietary Groovy-based domain specific language (DSL) for defining application topology and configuration. It became apparent that the DSL had become a central piece of Cloudify architecture. As a result, maintaining a proprietary DSL would only serve as a barrier going forward. TOSCA (Topology Orchestration Specification for Cloud Applications led by the Oasis Foundation) defines a standard templating language. TOSCA was similar and richer in concept to our previous 2.x specification. Because of this, using TOSCA seemed like a natural evolution.

At the time when we decided to use TOSCA, the specification was provided in a fairly complex XML. Our first step for adopting TOSCA was to come with a simplified YAML based version which better fits into the spirit of OpenStack. After the OpenStack summit in Hong Kong, when we first introduced the idea, it became apparent that members from the Oasis group were already thinking along this line.We decided to join forces and join the Oasis organization. We started working with the team to incorporate TOSCA as an official part of OpenStack. During the Atlanta summit, it was decided to integrate the TOSCA project into the Heat project and now there is an official definition of the TOSCA 2.0 specification based on YAML. Our goal is to use the TOSCA 2.0 specification as the official templating language for Cloudify instead of our current TOSCA like DSL.

Integrating with OpenStack Heat

OpenStack is growing, and with this the services that are included as part of the core OpenStack project continue to grow up the stack. One of these services is OpenStack Heat, which started as the equivalent of Amazon Cloud Formation, and is becoming more of a general purpose infrastructure orchestration these days.

We realized that since Heat is closely integrated with OpenStack it will follow the OpenStack release cycles and API, and therefore would provide a useful tool for setting up the OpenStack infrastructure. The approach that we have taken with Heat is an “overlay approach”. This means that users can use Heat as they do today to set up their infrastructure. Then, Cloudify integrates with Heat in a way that will allow it to discover any resource that was provisioned through Heat, and ultimately add the monitoring, logging and software stack on top of that environment.

Integrating with Neutron - Network Orchestration

Networking becomes a core service in any cloud deployment. Networking refers to an element such as security-group, private IP, floating IP as well as routers, DNS, vLans, load balancers, and such.

Cloudify 2.7 included basic integration with Neutron that allows users to attach floating IPs and set the availability zone configuration.

With Cloduify 3.0, we included support for all the networking elements and can now create vLans, security groups and more, as part of the application deployment.

Design for Scale

Application orchestration often requires intimate and continuous integration with the application so that it can detect failure in sub-seconds and take corrective actions as needed. This often imposes a more complex scalability challenge.

Cloudify 2.x uses a hierarchy of managers, where each manager controls 100+ nodes as a scaling architecture. One of the things that we have experienced over the last year is that there is a growing class of services such as Big Data and NFV. With these services, a single service can be comprised out of more than 100s and potentially even 1000s of instances. This is why the manager of managers approach didn’t fit well to these use cases.

To enable scaling of 1000s of nodes per manager, we decided to use a message broker approach based on AMQP as well as to separate the provisioning, logging and real-time monitoring tasks into separate services that can scale independently. This allows us to also control the level of intimacy between the Cloudify manager and the application services. As an example, we can tune Cloudify to handle the provisioning and logging only, and use the real-time information that has already been gathered by the infrastructure. This is especially important in the context of OpenStack, as we expect that a large part of the metrics can be gathered through the infrastructure itself.

Integrating your DevOps toolchain of choice

One thing that has become clear through the various OpenStack statistics based on their annual survey, is that many of the OpenStack users are using a toolchain that is comprised of tools available through OpenStack infrastructure, as well as tools that are not specific to OpenStack such as Docker, Chef, Puppet, Ansible, JClouds and others.

Integrating all these tools as part of a deployment system becomes a fairly complex task that can takes weeks and even months.

We realized that if we want to provide a full lifecycle management of the application, it wouldn’t be right to force a particular stack or toolchain on a user, but rather provide an open pluggable architecture that could enable you to easily integrate your choice of tools into the same deployment.

With Cloudify 2.x we included support for Chef, Puppet and also had a cloud driver that allows us to integrate with various cloud infrastructures, as well as with tools like JClouds.

With Cloudify 3.x we created a more generic plugin architecture that allows users to plug in almost any element of their application, starting from the cloud plugin to configuration management, orchestration, monitoring etc.

To make this simple we included a Bash plugin that provides a simple framework for integration with external tools, just by pointing the plugin to the relevant Bash script.

Final Notes

In this post I thought sharing the story behind the scenes that we went through to make Cloudify OpenStack native, and the rationale and decision process behind them, would be relevant to other users that are considering their OpenStack strategy. Especially for those still deciding how deep they should integrate with OpenStack.

The good news is that, it is quite possible to integrate natively with OpenStack without limiting yourself only to OpenStack. Having said that, the initial investment required to get there is quite big. I hope that this lesson from our own experience will save others time.

October 08, 2014

I described the various forms of popular data management APIs out there and how they’re often used. As I noted toward the end of the post, currently each of the APIs is often tied with a specific data model and data store and is used by different APIs that serve other use cases. One of the main reasons behind this is that each API represents a specific optimization that requires a fairly different data structure. Common to all of the various techniques however is that they were written under the assumption that disk is a bottleneck. That led to various point optimizations, architecture, and algorithms that are used by each of the APIs to bypass the disk bottleneck.

Memory Based Data Management Is Ready for Flash

Here's the good news: this need not always be the case. Flash devices have become so advanced that they are able to completely remove the disk from causing this bottleneck, which then makes it possible for us to explore API and data structures that will serve them in a completely different way. I would argue that there's no longer a need for point optimizations that are not compatible with one another and we can instead use a common data structure to serve each of the APIs.

Same Data, Many APIs

So unlike the majority of existing data management solutions which have been designed with the assumption that disk will be a bottleneck, In Memory Data Grids are based on RAM and therefore designed with the assumption that the data device is fast. Best of all, moving memory-based solutions into flash doesn't require any significant change to your current system. To illustrate this, let's look at our solution, XAP, and what it's able to do:

This example shows how one can write the same data as an object, read it as a document, query it through SQL via JDBC connection, navigate through its subfields as in object graphs, and so on. XAP even allows for the same data store to run both transactional data and analytics, vastly reducing complex data transformation.

Utopia vs. Reality

The ideal system would be one where we could rely on just one common data store that's been outfitted with a set of light-weight services that could expose different APIs and semantics to access data, as illustrated in the diagram below: In reality however, the maturity and cost of a flash-based solution isn't yet ready for such a transformation. A more realistic scenario is for a flash-based device to be used as a data bus or hub that can act as a front-end for various databases and for more high performance and latency-sensitive use cases. In order to reduce transformation complexity, however, flash will need to include built-in synchronization with those databases and streaming technologies so that the area of complexity is handled implicitly. Memory Based Data Management Systems Are More Suitable to Serve as High Speed Data Buses

Many of the existing databases were designed with the assumption that they serve as the main data store; therefore, moving from MySQL to Mongo for example often means a complete re-write. Memory Data Grids on the other hand were designed as an extension to the database that takes/holds the part of the data with high contention and serves it in-memory to speed up the access time and scalability. This approach is considered more complementary as it only affects part of the application that needs high speed access, where the remaining of the application continues to work with the same underlying database as if nothing had changed. As a result of that, most of the existing Data-Grid solutions already include fairly rich data synchronization plugins—hence why I would argue that they are more suitable to serve as a Bus or a Hub.

Example: Using High Speed Data Bus with Storm

I’ll leave you with one example of where this framework is effective: “Storm.” This stream processing framework that's used by Twitter includes a few plug-in external feeds (Spouts) while also still maintaining the state of processing (Trident State). The following illustration demonstrates how this integration can be used to handle real time processing of web page analytics by using Storm and XAP as the high speed data bus.

October 06, 2014

It was only a few years ago when nearly everyone relied on SQL exclusively to tackle Big Data needs, but as the demand for speed and space increases, so have our options. Now there are a number of new data systems that are mostly based around NoSQL, with each of them having been developed to best serve specific areas.
In this post, we'll be taking a look at seven APIs in particular and explore how these systems can be optimized for maximum speed and memory capabilities.

1. SQL: The term “SQL” may suggest that this API is no longer relevant in this new data world, but most of NoSQL implementations support a major subset of SQL. This system is able to provide a rich set of query and data management and is often the least common denominator of many data management systems.

SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City = 'London'

2. Document: The document API allows users to write a different structure of fields to the same logical table without any need for schema evolution (which is why document API is often known as “schema less”). It's one of the most popular for web applications that uses the JASON data model.

4. Tuple API: Tuples are one of the most common APIs for messaging and stream processing use cases. This API represents a simple data structure that is able to map into a flat data object. It’s often based on using the same tuple structure that was used to write the data instance as the query language, meaning that the tuple acts as a “mask” that indicates which instance type and matching fields are to be selected.

5. Key/Value: Key/Value represents the simplest form of data structures. As the name suggests, it consists of a single index per data object. This API is the most popular for caching and is often used as the underlying data structure of more advanced data management solutions.

6. Stream Based: This event processing model is the most suitable for handling any scenarios where continuous updates will be necessary. It's a popular API for real-time analytics scenarios, which explains why it's becoming increasingly popular for Big Data systems that rely heavily on incremental updates, but which don't require the locking of a large set of data.

7. Map/Reduce: This API is used to perform aggregation on distributed data. The Map/Reduce model is able to break up aggregation operations into two or more phases. Map executes the aggregation in each data node, and Reduce takes all of the sub aggregations from each node and then reduces them into one consolidated result. Operations such as calculating max, average, and mean is an example of the Map/Reduce Model.

There is no “one-size fits all” approach when it comes to data management systems. Because most of today's data management systems have an API which is tied to a data model in which the data is stored, we can't write data in one API and read it with another.
This means that if you want to use that same data for a different purpose, one would need to maintain copies of that data to match each use case API and data store. As such, a typical application would need to include a combination of various data management solutions with complex data flows between them, as illustrated below:

But does it really have to be that complex?

Stay tuned for “The Seven Most Popular APIs in Big Data—Part Two” to find out!