PaaS

October 17, 2012

The core of IaaS services is centered around Compute, Storage Network however over the past years the infrastructure moved up the stack and includes more application services such as Database as a Service, Hadoop Map/Reduce, and application lifecycle services.

On the other hand, PaaS services offer similar services through the PaaS platform itself with the main difference being that quite often those services were designed to fit only the application that runs on that particular PaaS.

So rather than the traditional view, where we draw a clear line between the IaaS and PaaS layer we need to view them differently, where the PaaS and IaaS layers overlap each other on the software services layer.

IaaS and PaaS share a similar technology stack

A close look into the OpenStack infrastructure shows that even on the infrastructure level running the compute, storage and network services relies on messaging services such as RabbitMQ, databases such as Postgress, a load balancer such as Nginx and other software services. Running these software services for our infrastructure is not much different than running our apps. With both the infrastructure and application we would need to address the scalability and high availabilityof these services and manage them through our infrastructure.

If that is the case wouldn't it make sense to share the same underlying technology to run our infrastructure and our apps?

Sharing the software services between your IaaS and PaaS

A typical public PaaS platform is built out of two main parts - a self-service portal, and a backend system that runs our apps.

With many of the public PaaS solutions such as GAE, Heroku the backend that runs our PaaS is treated as a blackbox.

Amazon Elastic Beanstalk does things differently in that respect - as it took a bottom-up approach. It uses the same AWS infrastructure to run Elastic Beanstalk and the infrastructure - this provides a lot of flexibility to its users, and equally as important, consistency between the way they run and manage their app and infrastructure. VMware took even a bigger step in this regard with CloudFoundry and BOSH and made their entire PaaS backend open source which gives users even greater flexibility. With Cloudify we took this idea a step further by introducing the concept of recipes into the core of the Cloudify infrastructure, and recently introduced our integration with Chef in a way that will enable users to run their Chef cookbooks as part of Cloudify. Chef has already gained popularity as a configuration and automation framework for setting up many of the OpenStack cloud deployments. Having Chef tightly integrated into the PaaS layer, simply put, makes perfect sense.

Example - Building your own RDS with Cloudify

To illustrate the idea of how we can use share the same deployment framework between our IaaS and PaaS layer I'll pick one of these software services to demonstrate the idea with. In this particular case, I've chosen to use a database as a service. Database as a service is often a good reference for a service that fits in both our infrastructure and application. Here is how we can easily set a database as a service with Cloudify:

Even in the case where we share the same software services through our infrastructure and application, we tend to run them differently.

Quite often infrastructure services will run on bare-metal environments, as we don't want to create dependencies between our virtualiation layer and the underlying infrastructure that runs it.

To enable this level of flexibility, we need to abstract the underlying compute layer from our service orchestration layer. In this way, we can use the same recipe and deploy it on either a bare-metal environment or a virtaulization layer. (Read: Configuring a Traditional Data Center [BYON])

Final notes

Up until recently, PaaS and IaaS were treated as two separate and closed entities and the integration between the two was done pretty much in a blackbox approach. The fact that both the IaaS and PaaS underlying infrastructure were closed was the primary reason behind this.

Now with the growing adoption of open source IaaS such as OpenStack and CloudStack, and similarly with PaaS such CloudFoundry and Cloudify we have the opportunity to change this paradigm. There isn't much of a reason why we shouldn't be able to share the same orchestration, and configuration layers between our IaaS and PaaS layers. Amazon Elastic Beanstalk is a good reference why this makes a lot of sense.

Special thanks

Special thanks to Boris Renski who was the main inspiration behind this post.

An Amazon data center in Ashburn, Virginia suffered a power outage at 9:45 p.m. PDT on Thursday, causing some websites using AWS cloud technology to go offline. High-profile websites like Heroku, Pinterest, Quora and HootSuite saw downtime, as well as many smaller sites.

In this post I'd like to briefly address the lessons from this experience, and more importantly, focus on the lessons from this sort of failure and their effect on public PaaS offerings, such as Heroku.

Lesson from the Heroku Outage: Choose the Right PaaS for the Job

One of the promises of PaaS is productivity. PaaS providers like Heroku claim to increase productivity by abstracting the user from the details of the underlying infrastructure, and even go so far as to claim that PaaS makes application operations redundant.

Lesson 1: Choose the Right PaaS for the Job

The main lesson from this outage is that relying on the PaaS provider to carry all your operations isn't always a safe bet. When we move to PaaS we still need to understand how they run their disaster recovery, high-availability, and scaling procedures. Heroku-like PaaS also forces you to a lowest common denominator approach to dealing with continuous availability and scalability. In reality, however, there are many tradeoffs between scalability, performance, and high availability. The best fit between those tradeoffs tends to be application specific, so compromising on a lowest common denominator could be less productive and more costly at the end of the day.

Which brings me to the last point in this section -- PaaS was meant to provide a higher productivity for running our apps on the cloud by abstracting the details of how we run our application (the operation) from the application developer. The black-box approach of many of the public PaaS offerings, such as Heroku, is often an extreme measure in this regard. There is often a close coupling between what the application does and the way we run it. A new class of Open PaaS platforms such as Cloudify, CloudFoundry, and OpenShift offer a different open source alternative that gives you more control of the underlying PaaS platform. Cloudify takes it even further, providing an open recipe model that integrates with the likes of Chef, enabling you to easily customize and control your operations without affecting developer productivity.

Lesson 2: Database Availability Must Address Datacenter Failure

The other area that Heroku, and to be honest, most other PaaS offering don't adequately address is database high-availability, which is obviously a tough area. Specificaly, in the event of data center failure or availability zone failure, as in the present case. To deal with database availability, it is necessary to ensure real-time synchronization of the database across sites. The example at the bottom of this post refers to a specific way this can be done, between two mySql instances running on Amazon and Rackspace with Cloudify.

General Lessons:

Lesson 3: Coping with Failure, Avoiding a Single Point of Failure

The general lesson from this and previous failures is actually not new. To be fair, this lesson is not specific to AWS or to any cloud service. Failures are inevitable, and often happen when and where we least expect them to. Instead of trying to prevent failure from happening we should design our systems to cope with failure. The method of dealing with failures is also not that new -- use redundancy, don't rely on a single point failure (including a data center or even a data center provider). Automate the fail-over process, etc...

Haven't Learned from Past Lessons?

The question that comes out of this experience IMO is not necessarily how to deal with failures (those lessons are as old as the mainframe or even older), but rather -- why are we failing to implement the lessons? Assuming that the people running these systems are among the best in the industry makes this question even more intresting. Here is my take on it:

We give up responsibility when we move to the cloud: When we move our operations to the cloud, we often assume that we're out-sourcing our data center operation completely, including our disaster recovery procedures. The truth is that when we move to the cloud we're only outsourcing the infrastructure, not our operations, and the responsibility of how to use this infrastructure remain ours.

Complexity: The current DR processes and tools were designed for a pre-cloud world, and do not work well in a dynamic environment, such as the cloud. Many of the tools that are provided by the cloud vendors (Amazon in this specific case) are still fairly complex to use.

Implementing Past Lessons in a Cloud World

The first point is is easy -- we need to assume full responsibility for our applications' disaster recovery procedures, in the cloud world just as if we were running our own data center. The hard part in the cloud world is that we often have less visibility, control, and knowledge of the infrastructure, which affects our ability to protect our applications -- and each sub-component of our application -- from failure. On the other hand, the cloud enables us to spawn new instances easily on various data center locations, a.k.a Availability Zones.

And so, most failures can be addressed by moving from the failed system to a completely different system regardless of the root cause of the failure. Therefore, the first lesson is that in the cloud world it is easier to implement disaster recovery plans, by moving our application traffic to a completely different redundant system in a snap, rather than trying to protect every component of our application from a failure. If we're willing to tolerate a short window of downtime, we can even use an on-demand backup site rather than pay the consistent cost and overhead of maintaining a hot backup site.

Which brings me to the next point: What do we need to build such a solution?

A consistent redundant environment that is ready to take over in case of failure needs to include the following elements:

Workload Migration: Specifically, the ability to clone your application environment and configuration in a consistent way accorss sites, and on demand.

Data Synchronization: The ability to maintain a real-time copy of the data between two sites.

Network Connectivity: Enabling the flow of network traffic between two sites.

Which leads to the second challenge: Complexity. Here, I would use an example of a simple web-app and show how we can easily create two sites on demand. I would even go so far as to set this environment on two separate clouds to show how we can ensure an even higher degree of redundancy by running our application across two different cloud providers.

A Step by Step Example: Fail-Over from AWS to Rackspace

In this example, we picked Amazon and Rackspace as the two target sites. The same solution would also work between two availability zones in Amazon or data centers. We've also tried the same example with a combination of HP Cloud Services and a flavor of a private cloud.

The example demonstrates a very simple web application with global load-balancer (Rackspace), and a Web application (Pet Clinic) based on Tomcat as the Web front end and MySQL as the database.

The goals that we set for ourselves were seamlessness and no change to the target application or database. We achieved this by plugging the replication service into the existing instances of MySQL. The replicating service listened to the MySQL events and replicated every change to its peer MySQL instance. Cloudify enabled us to clone the same application in both Amazon and Racksapce while maintaining a consistent configuration setup as well as consistent scaling and fail-over SLAs. Cloudify does this by abstracting all the information through portable recipe definitions. Cloudify wraps the application instances with its management and control services based on the definition provided in the recipe. This enabled us to clone the environment as well as add elastic scaling without changing the target application (Pet Clinic in this case).

The DevOps PaaS Infusion Meetup this Tuesday June 19th at 6:30 PM - NetFlix and OpsCode will be joining GigaSpaces at this meetup in NY where one of the highlights will be a panel on lesson learning from the recent AWS outage. Register here.

August 13, 2012

In a recent article, Steve Wozniak, who co-founded Apple with the late Steve Jobs, predicted "horrible problems" in the coming years as cloud-based computing takes hold.

"I really worry about everything going to the cloud,".. "I think it's going to be horrendous. I think there are going to be a lot of horrible problems in the next five years. ...."...with the cloud, you don’t own anything. You already signed it away.”

When I first read the title I thought, Wozniak sounds like Larry Ellison two years ago, when he pitched the Cloud is hype, before he made a 180-degree turn to acknowledge Oracle wished to be a cloud vendor too.

Reading it more carefully, I realized the title is instead just misleading. Wozniak actually touches on something that I hear more often, as the cloud hype cycle is moves from a Peak of Inflated Expectations into through the Trough of Disillusionment.

Wozniak echos an important lesson, that IMO, is major part of the reason many of the companies that moved to cloud have experienced lots of outages during the past months. I addressed several of these aspects in in a recent blog post: Lessons from the Heroku/Amazon Outage.

When we move our operations to the cloud, we often assume that we're out-sourcing our data center operation completely, including our disaster recovery procedures. The truth is that when we move to the cloud we're only outsourcing the infrastructure, not our operations, and the responsibility of how to use this infrastructure remain ours.

Choosing better tradeoffs between producivity and control

For companies today, the main reason we chose to move to the cloud in the first place was to gain better agility and productivity. But in starting this cloud journey, we found that we had to give up some measure of control to achieve the agility and productivity.

August 02, 2012

DevOps and PaaS represent two different paradigms for delivering applications to the cloud.

DevOps - DevOps takes an automation approach - with DevOps we basically script the process of installation, configuration and deployment of the application stack.

PaaS - PaaS takes an abstraction approach by abstracting the details of the cloud infrastructure from the developers.

DevOps tends to cater to operations guys that want to maintain the flexibility and control over the infrastructure and PaaS caters more to developers who just want to write code and deliver it without worrying too much about the details of the underlying infrastructure.

For a while DevOps and PaaS had been considered two competing paradigms as noted in Tom Mornini’s (Engine Yard) post DevOps is DOA:

"DevOps essentially requires everyone involved in the development, deployment, and management of software to change the way they work, to cross-train for multiple functions, and to collaborate with people they’ve all too often viewed as adversaries. It creates new processes and methodologies and, in many cases, uses new tools to get the job done. In addition, it often requires a small army of consultants to implement and manage the transition. That is why the “promise” and “potential” of DevOps has yet to be realized by more than a handful of real-world companies.

...The appeal of the DevOps model fades even further when you consider the fact that there is a better option! You can achieve the intended benefits of DevOps without any of the turmoil by taking advantage of Platform-as-a-Service offerings, or PaaS."

Putting DevOps and PaaS together

The truth is that it's hard to choose between the two paradigms. DevOps provides a high degree of control and flexibility but this flexibility comes at the cost of complexity and reduced productivity and often requires a culture change as Tom Mornini pointed out. PaaS on the other hand, provides an extremely simple deployment environment but it also forces you to fit into the PaaS environment, including the language of choice, middleware stack and cloud environment etc.

The best of both worlds- The control of DevOps and Productivity of PaaS

Do we really need to choose between the two? We can use DevOps as the foundation of the PaaS infrastructure. In this way, the operations guys can choose their stack of choice, package their specific blueprint, pick their cloud of choice and package all that as a pre-baked environment to the developers who in turn can be relieved of all of the details as they would be with any other PaaS. In this way we get the control of DevOps and productivity of PaaS.

At this point I'll refer to Cloudify as an implementation reference of that concept to illustrate my point in this regard.

Putting DevOps and PaaS together with Cloudify

Cloudify is an Open PaaS Stack that brings DevOps and PaaS together. Here is how:

The control of DevOps

Application Blueprint - Cloudify allows you to add your own application blueprint into the platform through recipes. Recipes are at the heart of Cloudify architecture. A recipe in Cloudify is a Groovy-based DSL. The recipe is where you define your application stack, including the installation, configuration, monitoring, scaling and fail-over and tell Cloudify what, how and where to deploy your app.

Integration with DevOps tools such as Chef - Cloudify allows you to call Chef cookbooks from within a Cloudify recipe. This gives Cloudify users a way to leverage the wide variety of existing Chef cookbooks and community and decorate it with a Cloudify recipe if needed, to add scaling, fail-over automation etc.

Operational Task Automation - Custom Commands allow you to attach common operation tasks such as update-app, such as in the case of a web app, run-snapshot in the case of a database and more, as part of a service recipe, as well as automate the post-deployment operational procedures of an application alongside its lifecycle.

The productivity of PaaS

Developers can run one of the existing recipes as they would with any other PaaS without going through any of the steps above. Unlike other Platforms-as-a-Service, they can also customize the existing recipes or pick custom recipes that cover their specific environment and application blueprint. All this without losing that same productivity. Developers can also get better visibility into the underlying infrastructure, and have a better experience when troubleshooting. They can also use Cloudify as an automation tool as part of their continuous deployment environment.

“We have a small number of DevOps engineers embedded in the development organization who are building and extending automation for our PaaS, and we have hundreds of developers using NoOps to get their code and datastores deployed in our PaaS.”

An interesting lesson from the Netflix experience is that an important part of running an online service is the ability to resolve any friction for developing and managing such a service. Moving a large part of the operational responsibility to the developers makes it even more important that the development platform (PaaS) come with operations tools that are pre-baked into the developer environment. At the same time, that same environment needs to also cater to a dedicated (but a significantly smaller) operational team. This makes the idea of putting DevOps and PaaS together even more critical to reach the business agility and operational efficiency. Netflix built their own PaaS environment to address such a need. Cloudify provides a similar solution for those who want to build their own Netflix environment but without the hassle involved with doing it themselves.

June 08, 2012

Its a common believe that Cloud is good for green field apps. There are many reasons for this, in particular the fact that the cloud forces a different kind of thinking on how to run apps. Native cloud apps were designed to scale elastically, they were designed with complete automation in mind, and so forth. Most of the existing apps (a.k.a brown field apps) were written in a pre-cloud world and therefore don't support these attributes. Adding support for these attributes could carry a significant investment. In some cases, this investment could be so big that it would make more sense to go through a complete re-write.

In this post I want to challenge this common belief. Over the past few years I have found that many stateful applications running on the cloud don't support all those attributes, elasticity in particular. One of the better-known examples of this is MySQL and its Amazon cloud offering, RDS, which I'll use throughout this post to illustrate my point.

Amazon RDS as an example for migrating a brown-field applications

MySQL was written in a pre-cloud world and therefore fits into the definition of a brown-field app. As with many brown-field apps, it wasn't designed to be elastic or to scale out, and yet it is one of the more common and popular services on the cloud. To me, this means that there are probably other attributes that matter even more when we consider our choice of application in the cloud. Amazon RDS is the cloud-enabled version of MySQL. It can serve as a good example to find what those other attributes could be.

The Amazon RDS Approach for Cloud Enablement of MySQL

Amazon's approach for cloud enablement of MySQL includes the followings elements:

Pre-tuned MySQL -- A pre-configured version of MySQL that was tuned to run well on the AWS instances.

There are couple of lessons that we can learn from this Amazon RDS experience:

There is value in migrating existing applications to the cloud. Or in other words, the cloud shouldn't be considered only for green field applications.

Automation of the installation and deployment processes is the "low hanging fruit" for migrating application to the cloud.

Elasticity -- quite often when we think of elasticity we think of scaling out, i.e. -- adding more capacity by launching more machines. As MySQL wasn't designed for this model, the RDS approach to elasticity was to automate the move from small to a bigger machine. Apparently this model had proven to be useful even if it involves downtime and is limited to scale-up.

Baby steps approach: When Amazon first released RDS it only included basic automation and it was only somewhere down the road htat it kept enhancing the service with more automation. This baby steps approach was a key to speed up the migration process of MySQL.

RDS vs SimpleDB

When AWS was first introduced it started with SimpleDB as its cloud database service. SimpleDB fits into the "green field" application approach to cloud. The approach with SimpleDB was to trade features and functionality for simplicity, elasticity and scalability. The popularity of RDS over SimpleDB is another interesting lesson -- trading features for scalability is definitely a simpler approach to get things running on the cloud. However, there is a considerable number of users and applications that would prefer to trade some degree of scalability for features. This also teaches us that the "green field" approach isn't necessarily the right approach. Migrating existing services has lots of value, especially if they are already in wide use.

Applying These Lessons to Existing Applications

The main take away for migrating your existing app to the cloud can be summed up as follows:

Use the baby steps approach

Start with simple automation and monitoring of the installation and deployment process

Add automation of post-deployment processes such as fail-over and scaling as a secondary step

Use a pragmatic approach to elasticity. There are areas of your app where the simplest way to apply elasticity would be simple automation of the move from a small to a bigger machine, often the stateful part of your application. There are other areas of your application, such as the web-tier, where using full elasticity can be done by spawning new instances and automating the discovery and configuration for how these instances group together, as in the case of a web tier.

DevOps & PaaS Can Be a Great Tool to Implement These Steps

There are many ways we could implement the baby steps approach, especially if we look at it as a one-time operation. If we need to move more than one application, we can easily find ourselves repeating the same effort over and over again for each application. DevOps tools such as Chef provide a good generic tool for scripting your installation and configuration processes. New open PaaS platforms, such as CloudFoundry, Cloudify, and OpenShift address the entire application deployment life cycle, including the automation of scaling, fail-over and monitoring. An integration between DevOps and PaaS, such as the integration between Cloudify and Chef,is even better-suited tothis challenge, as it gives you the best of both worlds.

Final Words

According to a recent HP Survey with 940 IT responders, a majority of businesses are planning to move their mission-critical apps to the cloud over the next two to five years. Arecent survey by Cisco of 1300 responders shows that today only 5 percent of the responders are able to migrate meaningful parts of their application to the cloud. According to this survey, what makes migration to the cloud even more difficult is the lack of information about the process.

Many new cloud users are lost in a sea of hype-driven desire to move to cloud computing, without many proven best practices and metrics.

I believe that the lessons from Amazon RDS could serve as a good reference on what that process could be. If you are considering moving more than one application into the cloud, you should consider the use of DevOps & PaaS as the means to deliver those steps.

May 23, 2012

I'm very excited to announce our first OpenStack Israel event on Wednesday the 30th of May in Petach Tikva in collaboration with the IGT Cloud and Rackspace. Since Avner Algom and myself started to work on the event a few months ago I was positively surprised to see the amount of local innovation and work that is already taking place here at Israel in this space. As I write this post, it looks like we’re getting close to 200 registrations, and all of this has been achieved for the most part by word of mouth, a clear indication of the growing interest here.

We’ve got a nice lineup of presenters covering different parts of the stack:

Shlomo Swidler will be leading two panel discussions, during the first discussion we’ll review the shift toward Open Source in the cloud and the effect of that move on the cloud industry. In the second panel we’ll try to predict some future directions and trends we expect for OpenStack.

I would also like to take this opportunity and welcome our distinguished guests from Rackspace, IBM and StackOps:

During the event we reserved a slot for lightning talks - If you have an interesting story you'll have the chance to share it during the event.

Anyway this event is going to be a kickoff for a series of meetups around OpenStack that is planned throughout the year. We'd appreciate any help in this regard, so if you have suggestion for a meetup this is your chance.

I would like to finish up with special thanks to Marc Coollier and Luaren Sell from Rackspace who's been very instrumental and supportive throughout this process. We wouldn't have been able to pull this off without their close help.

It's a fact that virtualization is not a requirement when creating cloud computing services, but it is helpful to those who manage the service. Indeed, Google is able to provide a multitenant cloud computing platform without virtualization; there are other examples as well.

Internap, SoftLayer, Rackspace, Liquid Web, and New Servers (also known as BareMetalCloud.com) also provide access to the bare metal. You can count on more providers to join the fray as cloud computing users continue to demand that their managed hosting environments work like their native environments.

Performance Matters

The main reason for the rise of bare-metal clouds is that virtualization often comes with a cost. There is a large class of applications where the virtulization overhead is unacceptable, as Carl Brooks, a cloud analyst at Tier 1, a division of 451 Research, points out here:

By having dedicated servers that are not virtualized, and therefore do not have a hypervisor layer, users can experience an uptick in speed, Brooks says, in some cases as much as 10% depending on the application. This can be an attractive option for high-performance compute needs, advanced Web 2.0 developers, or applications that require a large amount of database resources. Basically any situation "where the performance matters the most"

Come to think of it, the main thing that brought us to virtualization in the first place was the ability to create new servers on demand, as opposed to it taking days or weeks with dedicated servers. Now, with the avliability of bare-metal clouds, it is possible to get the same level of flexibility without the virtualization overhead, and with more control and flexibiltiy on the specific HW configuration and setup. This makes the choice of bare-metal clouds much more attractive than in the past.

Bare-Metal PaaS

Now that bare-metal clouds have become more popular, it only makes sense to have bare-metal PaaS to support them. Bare-metal PaaS provides the abilty to manage an elastic application workload without relaying on virtualization, providing the option to support high-performance workloads with the simplicity of PaaS.

This is particularly interesting for Big Data and any data intensive application that is heavy on I/O. Quite often, running these types of applications on top of virtualization isn't even considered a viable option. With bare-metal PaaS we can easily serve I/O intensive workload without compromising on performance or latency.

A Bare-Metal PaaS Example with Cloudify

Cloudify is an open PaaS stack. With Cloudify, we look at cloud infrastructure as just a bunch of compute resources with an operating system, IP address, SSH port. We use a generic Cloud-Driver that enables us to abstract the way those machines can be allocated and created on each different clouds infrastructure. In that context we treat a virtual machine as just generic container of an operating system. Cloudify manages applications at the process level. This means it manages the application workload by provisioning the application processes across machines.

With that, it only made sense to look at bare-metal clouds as any other virtualized cloud. So we ended up writing a bare-metal cloud driver, which is referred to as the Bring Your Own Node (BYON) cloud driver.

A BYON cloud is set simply by specifying a list or range of IP addresses as our cloud pool. Cloudify then uses this pool to provision the application and its services. Below is a simple example that shows what this configuration looks like.

Keeping Available the Choice Between Bare-Metal and Virtualized Cloud

In reality, there are cases where the use of virtualized cloud instances makes sense, and as already discussed here, there are also cases where using a bare-metal cloud would be the right option. Ideally it would be best to abstract the application from those choices in order to easily move between the two environments, or even combine them. For example, imagine the case of a typical web application. It is very likely that we would want to run our web traffic workload on virtualized instances and our database on bare-metal instances.

In the case of Cloudify -- keeping the details of the specific cloud target at the CloudDriver enables us to achieve just that. We can define our application as a recipe, which is kept unchanged between the cloud choices. The details of whether to use a bare-metal or virtualized cloud is left to the deployment stage where all that is needed is to point Cloudify to the target cloud driver, and that's it.

May 03, 2012

Cloud is evolving at an incredible pace, changing almost every aspect of our industry. It’s only natural then, that with this change also comes a continuous evolution of how we classify and categorize the various components of the cloud stack.

In earlier days, it was enough to draw the lines between IaaS, PaaS, and SaaS, and while this categorization is still very handy, we are beginning to see the lines between these categories start to blur. On the one hand, we see more IaaS providers starting to move up the stack, and on the other hand, we are seeing new categories such as DevOps gaining popularity and bridging the gaps between the IaaS and the PaaS layers.

I was recently asked to draw a map of how I see the current cloud stack. This ended up being an interesting exercise. To make things simpler I decided to focus on the IaaS/PaaS piece, both because this is my area of expertise, and also because IMO this is where many interesting developments are taking place. I ended up with the diagram below, which I thought would be worth sharing in order to get feedback and different perspectives.

It’s important to note, that this diagram is not based on any scientific scoring of different products in the market. I'll leave that part to the analysts. The specific products I chose to use are only there to illustrate and support my analysis of the components in the stack and where they map to within each category.

What's Driving the Cloud Stack?

The two main (and often conflicting) driving forces behind the evolution of the cloud stack are Productivity and Control. As cloud started to emerge it was productivity that took center stage, often by giving up almost completely on the level of control. As the industry started to mature we started to see an evolution where we could have various degrees of tradeoffs between these two driving forces. A good indicator of this, is the rise of DevOps and Private Clouds. We could also see the new open source movement as yet another major step in that direction - the most notable one is OpenStack, and the recent Citrix announcement about the moving of Cloud Stack into the Apache Foundation.

We could also see similar movement happening on the PaaS layer, where VMware launched Cloud Foundry as both a self-service PaaS for development productivity, and as an open source project to provide users with more control. Cloudify takes similar yet more explicit moves in that direction by bringing DevOps and PaaS together, in addition to making it open source.

Mapping the PaaS Stack

Geva Perry wrote a good summary mapping the current Java PaaS players in his post Listing the Java PaaS Public and Private Clouds . Geva chose to classify the different players based on their private/public cloud position and language support (Java in this particular case). Obviously the correlation behind public/private cloud and productivity/control is fairly close, so I can see where he was coming from. While this makes a lot of sense, I believe that it’s not accurate enough as there are still solutions that work on both private and public clouds and are not limited to just Java - Cloudify included. So is there a better way to classify the PaaS providers?

Here is my suggestion:

I would classify the PaaS providers based on those who focus on delivering PaaS through a self-service portal I’ll refer to them as PaaS service providers, where they assume full control over the underlying infrastructure and own the entire operation of your application - this is where I would put GAE, Heroku etc. Next will be those who gives you the tools to build your own PaaS on private or public clouds, I’ll refer to them as PaaS stack providers - this is where Cloudify and I believe, Cloud Foundry and Stackato also fit in.

The second sub-category of classification would be on the level of openness and control i.e. open source, language support, cloud environment support, DevOps support and such.

In any case, we can easily get into a complex matrix of features/mapping for even more fine-grained classification but this far beyond the scope of this post which is merely drawing a rough map rather than providing a complete competitive analysis.

How would you categorize the Cloud/PaaS Stack?

As I noted earlier, this is just my own personal view on the subject. I hope that this will spark an interesting debate in which we may either come out smarter, or possibly even more confused. :)

Edd touched briefly on the role of PaaS for delivering Big Data applications in the cloud

Beyond IaaS, several cloud services provide application layer support for big data work. Sometimes referred to as managed solutions, or platform as a service (PaaS), these services remove the need to configure or scale things such as databases or MapReduce, reducing your workload and maintenance burden. Additionally, PaaS providers can realize great efficiencies by hosting at the application level, and pass those savings on to the customer.

Even though Edd’s article covers all the different forms of running Big Data on private and public clouds, the article focuses mainly on the public cloud offering from Amazon, Microsoft and Google.

In this post, I wanted to cover more specifically how I see the evolution of cloud application platforms (PaaS) to support Big Data. I’ll refer specifically to Cloudify which was designed primarily to support Big Data applications.

Big Data in the cloud using Cloudify

Background

Most of the PaaS solutions out there started by focusing on simple web application deployments on Ruby, Java and Node.js. Unlike other PaaS solutions, when we designed Cloudify we picked Big Data as the primary target for Cloudify, and started by supporting popular NoSQL clusters such as Cassandra and MongoDB, as well as providing the equivalent of Amazon RDS by providing recipes for MySQL. Our goal was to make Big Data deployments a first class citizen within Cloudify. To this end,when you download Cloudify you'll notice that ALL of our examples comes with pre-integrated Big Data deployments.

There are couple of reason that brought us to make that decision:

Managing large data clusters is a core expertise at GigaSpaces

Most people know GigaSpaces for our In-Memory Data Grid solution known as XAP (eXtreme Application Platform). Over the past 10 years, as our customer deployments grew substantially, we realized that developing strong automation and cluster management is as critical as handling data consistency, performance, and latency in of our data-grid product. In a large cluster, if something breaks it’s going to literally be impossible to handle that failure through manual procedures. For that reason, we developed lots of IP around automation of data cluster deployment which resulted in a unique self-managed data cluster.

Cloudify is a natural evolution of GigaSpaces Data Cluster

When we built Cloudify it made a lot of sense to take the IP that we developed for managing GigaSpaces cluster and simply generalize it so it would fit with any other framework. In this way, we could leverage the years of experience as well as development in this area, and gain a significant head-start.

Big Data applications are complex

Big Data applications tend to be fairly complex, which makes them an ideal candidate for the sort of automation and management that Cloudify can offer.

Big Data applications have a lot in common with XAP applications

Both need automation of data, failover and recovery, both fit into large cluster deployments, and both share similar partitioning and other clustering architecture.

What makes Big Data platform different than any other application

Most of the existing orchestration systems were designed to handle stateless processes. Moving data is a completely different ballgame as you need to think of:

Primary and Backup dependency

Availability - moving data without losing it.

Moving processes to the data rather than the other way around.

Data replication within and across sites

Automating any of these processes through general orchestration tooling like Chef or Rightscale can become a fairly involved and complex process, with lots of pitfalls with handling edge scenarios for example the handshake process that is often involved when automating a data node failure, including a split-brain scenario.

In Cloudify we were able to curve out lots of that logic from the user, for example Cloudify will automatically ensure that primaries and backups won’t run on the same node or data center in case of disaster recovery. You don't need to do anything but tag your cluster nodes with a zone-tag.

Managing Big Data applications != Managing Big Data storage

Managing data clusters is one thing. Being able to process the data is yet another challenge that we need to think about when we’re dealing with application platforms as I noted in one of my earlier post.

The main challenge is that quite often the management of the data processing logic is built on completely different scaling, availability and monitoring tools than the one used for managing our Big Data deployment. It turns out, that this silo thinking leads to a whole set of complexities starting from the inconsistency in having multiple managers, each determined in a different way when there is a failure or scaling event, and that quite often end up conflicting with one another. Having lots of moving parts is yet another challenge that makes the entire deployment pretty much a complete mess.

Over the next few years, we'll see the adoption of scalable frameworks and platforms for handling streaming, or near real-time analysis and processing. In the same way that Hadoop has been borne out of large-scale web applications, these platforms will be driven by the needs of large-scale location-aware mobile, social and sensor use.

Being also part of XAP, Cloudify already comes with built-in support for streaming Big Data processing. This means that building your own Facebook or Twitter-like real-time analytics can be as a simple as writing only small scripts that handle the analytics counters. All the rest, i.e. scalability, availability, automation, cloud portability, management and monitoring, is covered by Cloudify as noted in this and this use case.

Examples for Big Data applications running on Cloudify

In the list below, I tried to put together a couple of references and examples that will make it easier for you to get started. The first reference points to a simpler scenario that will allow you to use Cloudify to deploy your Big Data database as a service. The other three references are full-application stack deployments that include the data-processing and web-tier of applications managed together with the Big Data database.

Running a Big Data 'Database as a Service'

Cloudify comes with built-in recipes for Cassandra and MongoDB, as well as Solr (popular search engine), which makes it easy to deploy these database clusters on your local machine, data center or private/public cloud through a single command. In this way, you can use Cloudify to automate the database.

Spring Travel application with Cassandra

Demonstrating a deployment of a JAVA-based commerce application with Cassandra as the database

The example includes recipes that provision the Cassandra database, create a schema, load the data, then spawn a Tomcat container, automatically injecting the reference to Cassandra, that then shows the custom management and monitoring of the application - all through a single command. A recent videocast showing how the travel application works on HP OpenStack cloud is available here.

Pet Clinic example with MongoDB

The Pet Clinic example does pretty much the same thing only using a sharded MongoDB cluster.

Twitter Real Time analytics example for Big Data

The Twitter example shows how you can attach real time stream-based processing for handling real live Twitter feeds, and how you can manage both the stream processing cluster and the Big Data cluster using Cloudify and run it on any cloud. The entire source code for this example is provided on Github.

Give it a try

To try out any of the examples you'll need to download Cloudify (latest) or (stable) build. Cloudify comes with all first three examples as part of the distribution under the recipes or examples directory. To try out these examples simply follow the steps from the Cloudify Quick Start Guide.