Cloud computing with Linux

Cloud computing platforms and applications

M. Tim JonesPublished on September 10, 2008/Updated: February 11, 2009

IBM and Amazon Web Services

Cloud computing provides a way to develop applications in a virtual environment,
where computing capacity, bandwidth, storage, security and reliability aren't
issues—you don't need to install the software on your own system. In a
virtual computing environment, you can develop, deploy, and manage applications,
paying only for the time and capacity you use, while scaling up or down to
accommodate changing needs or business requirements.

You can't read a technical Web site these days without some mention of so-called
cloud computing. Cloud computing is really nothing more than
the provisioning of computing resources (computers and storage) as a service. Along
with that comes the ability to dynamically scale the service to additional computers
and storage in a simple and transparent way. All this is similar to the ideas behind
utility computing, in which computing resources were viewed as a
metered service, as is the case for more traditional utilities (such as electricity
or water). What's different is not the goal behind these ideas but the existing
technologies that have come together to make them a reality.

One of the most important ideas behind cloud computing is scalability, and the key
technology that makes that possible is virtualization. Virtualization allows better
use of a server by aggregating multiple operating systems and applications on a
single shared computer. Virtualization also permits online migration so that if a
server becomes overloaded, an instance of an operating system (and its applications)
can be migrated to a new, less cluttered server.

From an external view, cloud computing is simply the migration of computing and
storage outside an enterprise and into the cloud. The user defines the resource
requirements (such as computing and wide area network, or WAN, bandwidth needs), and
the cloud provider virtually assembles these components within its infrastructure,
as shown in Figure 1.

Figure 1. Cloud computing migrates resources
within the Internet

But why would you willingly relinquish control over your resources and allow them to
virtually exist in the cloud? There are many reasons, but two that I believe are
most important are cost and scalability. The goal of cloud computing is to make
these resources less expensive than what you can provide for and manage yourself.
Along with this reduction in cost comes greater flexibility and scaling. A cloud
computing provider can easily scale your virtual environment for greater bandwidth
or computing resources with the provider's virtual infrastructure.

The green advantage to cloud computing is the ability to virtualize and share
resources among different applications for better server utilization. Figure 2 shows
an example. Here, three independent platforms existed for different applications,
each running on its own server. In the cloud, servers can be shared (virtualized)
for operating systems and applications to better use the servers, resulting in fewer
servers. Fewer servers means less required space (minimizing the data center
footprint) and less power for cooling (minimizing the carbon footprint).

Figure 2. Virtualization and resource
use

But there are trade-offs, and cloud computing is not without its warts. This article
explores some of these issues later. But now, let's dig deeper into cloud computing
to explore what it's all about.

Anatomy of cloud computing

As you peer inside the cloud, you find that it's actually not just a single service
but a collection of services, as shown in Figure 3. These layers define the level of
service provided.

Figure 3. The layers of cloud computing

Let's start at the lowest level of service provided, which is the infrastructure
(Infrastructure-as-a-Service, or IaaS). IaaS is the leasing of
infrastructure (computing resources and storage) as a service. This means not only
virtualized computers with guaranteed processing power but reserved bandwidth for
storage and Internet access. In essence, it's the capability of leasing a computer
or data center with specific quality-of-service constraints that has the ability to
execute an arbitrary operating system and software.

The value of cloud computing

Besides reducing the management cost associated with cloud computing resources,
there are other advantages. For example, when you separate yourself from your
resources by the Internet, it doesn't really matter where those resources
reside. They could be, for example, in a climate that offers ambient (natural)
cooling and therefore minimizes energy usage.

Moving up the stack, the next level of service is the platform
(Platform-as-a-Service, or PaaS). PaaS is similar to IaaS but includes
operating systems and required services that focus on a particular application. For
example, a PaaS in addition to virtualized servers and storage provides a particular
operating system and application set (typically, as a virtual machine, or VM, file,
such as VMware's .vmdk format) along with access to necessary services such as a
MySQL database or other, specialized local resources. In other words, PaaS is IaaS
with a custom software stack for the given application.

Finally, at the top of Figure 3 is the simplest service that can
be provided: the application. This layer is called Software-as-a-Service
(SaaS), and it is the model of deploying software from a centralized system to run
on a local computer (or remotely from the cloud). As a metered service, SaaS allows
you to lease an application and pay only for the time used.

That's the 30,000-foot view of cloud computing. This view ignores some of the other
aspects of the cloud, such as data-Storage-as-a-Service (dSaaS), which
provides storage as a metered service in which the consumer is billed based on used
capacity (the amount of storage used) and utilization (bandwidth requirements for
the storage). Cloud services have also emerged, which provide internal mechanisms
for interoperability as well as external application program interfaces (APIs), such
as Web services.

The cloud computing landscape

In recent months, there's been an explosion of investment into cloud computing and
related infrastructure. This massive investment indicates that there is demand for
virtualization of resources inside the cloud. The past year has seen many new
services, some of which are shown in Figure 4.

Figure 4. Cloud computing layers with offerings

This is by no means an exhaustive list of offerings, as it changes quite frequently.
However, it does provide an overview of some of the offerings and how they are
differentiated. Links to some of the offerings are included in Related topics later in this article.

Linux and open source in the
cloud

Let's now explore how Linux and the open source community contribute to the world of
cloud computing. As you might have guessed, Linux and open source technologies play
a huge role.

Software-as-a-Service

SaaS is the ability to access software over the Internet as a service. An early
approach to SaaS was the Application Service Provider (ASP). ASPs provide
subscriptions to software that is hosted or delivered over the Internet. The ASP
delivers the software and charges fees based on its use. In this way, you don't
purchase the software but simply lease it on an as-needed basis.

Example SaaS

An interesting example of traditional versus SaaS applications is the application
life cycle management tool from SoftwarePlanner.com. This company offers their
tool using the traditional model, where customers host the application suite
within their enterprise, or as SaaS, where customers host the application suite
and make it available over the Internet.

Another perspective on SaaS is the use of software over the Internet that executes
remotely. This software can be in the form of services used by a local application
(defined as Web services) or a remote application observed through a Web
browser. One example of a remote application service is Google Apps, which provides
several enterprise applications through a standard Web browser. Remotely executing
applications commonly rely on an application server to expose needed services. An
application server is a software framework that exposes APIs for
software services (such as transaction management or database access). Examples
include Red Hat JBoss Application Server, Apache Geronimo, and IBM®
WebSphere® Application Server. Many other application servers exist, and an
extensive list is included in Related topics.

Another recent example of SaaS is Google's Chrome browser. The browser is an ideal
environment as a new desktop through which applications can be delivered (either
locally or remotely) in addition to the traditional Web browsing experience. (For
more information, see Related topics.)

Platform-as-a-Service

PaaS can be described as an entire virtualized platform that includes one or more
servers (virtualized over the set of physical servers), operating systems, and
specific applications (such as Apache and MySQL for Web-based applications). In some
cases, these platforms can be predefined and selected; in others, you can provide a
VM image that contains all the necessary user-specific applications.

One interesting example of a PaaS is Google App Engine. App Engine is a service that
allows you to deploy your Web applications on Google's very scalable architecture.
App Engine provides you with a sandbox for your Python application that can be
referenced over the Internet (and additional languages will be supported in the
future). App Engine provides Python APIs for persistently storing and managing data
(using the Google Query Language, or GQL) in addition to support for authenticating
users, manipulating images, and sending e-mail. The sandbox in which the Web
application runs restricts access to the underlying operating system. Although App
Engine limits the functionality available to your application, it supports the
construction of useful Web services. Check out Related
topics for more information.

Note: Deploying applications in App Engine is free within certain
bandwidth and storage constraints. To build production Web sites with App Engine,
usage fees are assessed.

Another example of a PaaS is 10gen, which is both a cloud platform and a downloadable
open source package for creating your own private cloud. A software stack similar to
App Engine, 10gen provides similar functionality to App Engine—with certain
differences. With 10gen, you can develop applications in Python as well as the
JavaScript and Ruby programming languages. The platform also uses the sandbox
concept to isolate applications and provide a reliable environment over a large
number of computers (built, of course, on Linux) using their own application server.

Infrastructure-as-a-Service

IaaS is the delivery of computer infrastructure as a service. This layer differs from
PaaS in that the virtual hardware is provided without a software stack. Instead, the
consumer provides a VM image that is invoked on one or more virtualized servers.
IaaS is the rawest form of computing as a service (outside of access to the physical
infrastructure). The most well-known commercial IaaS provider is Amazon Elastic
Compute Cloud (EC2). In EC2, you can specify a particular VM (operating system and
application set), and then deploy your applications on it or provide your own VM
image to execute on the servers. You're then billed simply for compute time,
storage, and network bandwidth.

The Eucalyptus project (Elastic Utility Computing Architecture for Linking Your
Programs To Useful Systems) is an open source implementation of Amazon EC2 that is
interface-compatible with the commercial service. Like EC2, Eucalyptus relies on
Linux with Xen for operating system virtualization. Eucalyptus was developed at the
University of California, Santa Barbara, for the purpose of cloud computing
research. You can download it from the university's Web site (see Related topics), or you can experiment with it via
the Eucalyptus Public Cloud with certain restrictions.

Another EC2 style of IaaS is the Enomalism cloud computing platform. Enomalism is an
open source project that provides a cloud computing framework with functionality
similar to EC2. Enomalism is based on Linux, with support for both Xen and the
Kernel Virtual Machine (KVM). But unlike other pure IaaS solutions, Enomalism
provides a software stack based on the TurboGears Web application framework and
Python.

Other cloud developments

In addition to the developments already discussed, several other Linux-based open
source packages are useful in cloud environments. Hadoop is an open source
Java™ software framework similar to PaaS but focused on manipulating large
data sets over a set of networked servers (inspired by Google MapReduce, which
enables parallel processing of large data sets). As such, it finds use in Web search
and advertising applications—in particular, at Yahoo! Hadoop also provides
several sub-projects, mimicking Google applications. For example, HBase provides
Google BigTable database-like functionality, and the Hadoop Distributed File System
(HDFS) provides similar functionality to Google File System (GFS).

Issues and challenges

The issues of cloud computing are clear—with privacy and security being two of
the most important. Privacy can be combated with encryption, but due diligence is
required when selecting a cloud computing service. Even e-Commerce was viewed in a
skeptical light when the Web started to grow. Worldwide, trillions of dollars-worth
of e-Commerce transactions occur annually, so cloud computing will benefit from all
the technologies (such as Secure Sockets Layer, or SSL) that make the Web safe
today.

Going further

The cloud computing rush has just begun, and so has the open source development on
Linux that will drive it. Given the massive investment being made in cloud
computing, it's clear that a shift is occurring back to centralized data centers. It
will be interesting to see the new technologies and architectures that are around
the corner.

Wikipedia gives a great
comparison of application servers that includes both open source and
proprietary solutions. You'll find standard Java 2 Platform, Enterprise Edition,
application servers and even functional programming-based application servers
such as the Haskell-oriented HApps.

The most notable IaaS solution is Amazon EC2,
but open source solutions also
exist. IaaS provides a virtualized hardware infrastructure ready for VM
execution.

Hadoop is a software stack
that allows you to process large amounts of data in a scalable and efficient
way. It provides a programming-based platform along with distributed file system
and applications.