Introduction to OpenStack

What is OpenStack?

You've probably heard of OpenStack. It's that cloud software that's getting a lot of attention
from big names in the IT industry and major users like CERN, Comcast and PayPal. However,
did you know that it's more than that? It's also the fastest growing open source community
in the world, and a very interesting collaboration among technology vendors and users.

OpenStack is really truly open. If you want to test it out, you can stop reading this
article (come back after - we'll still be here!), visit
http://status.openstack.org/releaseand get a real-time list of features that are under
development. Select one, follow it to the code review system, and give your comments!
This is an example of Open Development, just one of the "Four Opens" on which OpenStack was founded
(https://wiki.openstack.org/wiki/Open).
Of course, you likely already know that OpenStack is fully released under the Open Source
Apache 2 license - no bits reserved, but did you also know about the principle of "Open Design"?

The software is released on a six-month cycle, and at the start of each cycle we host a
Design Summit for the contributors and users to gather and plan out the roadmap for the next
release. The Design Summit has become part of an increasingly large conference that also hosts
workshops for newcomers, inspiring keynotes and some fairly amazing user stories. However,
somewhere tucked away are rooms with chairs arranged in semicircular layouts ensconcing
dozens of developers engaged in robust discussion, taking notes on a collaborative document
displayed on a projector. This is where the roadmap of OpenStack is determined, pathways
for implementations of features are aligned and people volunteer to make it reality. This
is the Open Design process, and we welcome your participation!

Enjoying this article? Sign up for more content like this, and offers from Linux Journal.

Email: *

OpenStack may have
started with just two organizations, NASA and Rackspace, but now there are hundreds and
with each additional member the community grows stronger and more cohesive. Every morning
the OpenStack developer awakes to a torrent of email from the discussions in other countries,
and the momentum is best described as 'intense'.

Overall, the standout feature of
OpenStack is this strong community. It's extremely diverse, comprised of very
different technical backgrounds (python developers to packagers to translators) and
different philosophical backgrounds (free software evangelists to hardcore capitalists).
It's also very widespread, with the OpenStack Foundation claiming membership from 130
countries as of October 2013. So, dear reader, chances are there is a place for you.

Becoming a Contributor

Over a twelve-month period, OpenStack typically has more than a thousand software
developers contributing patches. Despite this, we always need more!

OpenStack uses Launchpad for bugs and Github for code hosting, but neither of these
places for contributing code. That's right: no github pull requests are accepted. Don't
Panic yet - this is for good reason: all patches to OpenStack go through an extensive code
review and testing process (https://wiki.openstack.org/wiki/Gerrit_Workflow).

Every code change in OpenStack is seen by at least 3 people (the owner, and two of the
core reviewers for the project), but often many more, and test cases and PEP8 compliance
is required. The patches are also run through the continuous integration system, which
effectively builds a new cloud for every code change submitted - ensuring that the interaction
of that piece of software is as expected with all other parts of OpenStack. As a result of
this quest for quality, many have found that contributing has improved their python coding skills.

Of course, not everyone is a Python developer. However, not to worry - there's still
a place for you to work with us to change the face of cloud computing.

Documentation

In many cases, the easiest way to become a contributor to OpenStack
is to participate in the documentation efforts. It requires no coding, just a
willingness to read and understand the systems that you’re writing about. Because the
documentation is treated like code, you will be learning the mechanics necessary to make
contributions to OpenStack itself by helping with documentation. Visit
https://wiki.openstack.org/wiki/Documentation/HowTo
to find out more.

Asking Questions

Http://ask.openstack.org is a StackOverflow-style
board for questions about OpenStack. Feel free to use it to ask yours, and if you have the
ability - stick around and try to answer someone else’s. Or at least vote on the ones that look good.

Evangelism

Think OpenStack is pretty cool? Help us out by telling your friends; we'd really
appreciate it. You can find some materials to help at http://openstack.org/marketing
or join the marketing mailing list to find out some cool events to attend.

User Groups

Navigating the Ecosystem: Where does your OpenStack Journey begin?

One of the unique aspects about OpenStack as an Open Source project is that there are
many different levels you can begin to engage with it - you don't have to do everything yourself.

Starting with Public Clouds - you don't even need to have an OpenStack installation to
start using it. You can today swipe your credit card at eNovance, HP, Rackspace and others
and just start migrating your applications.

Though, of course, for many the enticing part of OpenStack is to build their own private
cloud, and there are several ways to do that. Perhaps the simplest of all is an
appliance-style solution. You purchase a thing, unbox it, plug in the power and the network
and it just is an OpenStack Cloud.

However, hardware choice is important for many
applications, so if that applies to you - consider that there are several software
distributions available. You can of course get enterprise-supported OpenStack from
Canonical, Red Hat and SUSE, but take a look also at some of the specialized distributions,
such as those from Rackspace, Piston, SwiftStack or Cloudscaling.

If you want someone
to help guide you through the decisions from the hardware up to your applications, perhaps
adding in a few features or integrating components along the way, consider contacting one
of the system integrators with OpenStack experience like Mirantis or Metacloud.

To derive the most from the
flexibility of the OpenStack framework, you may elect to perform a 'DIY' solution. In which
case, we strongly recommend getting a copy of the OpenStack Operations Guide
(http://docs.openstack.org/ops), which discusses
many of the decisions you will face along the way. There's also a new OpenStack Security
guide (http://docs.openstack.org/sec/) that
is an invaluable reference for hardening your installation.

DIYing your OpenStack Cloud

If after careful analysis, you've decided to construct OpenStack yourself from the ground
up, there are a number of areas to consider.

Storage

One of the most fundamental
underpinnings of a cloud platform is the storage on which it runs.

In general, when
you select storage back-ends, ask the following questions:

Do my users need block storage?

Do my users need object storage?

Do I need to support live migration?

Should my persistent storage drives be contained in my compute nodes, or should
I use external storage?

What is the platter count I can achieve? Do more
spindles result in better I/O despite network access?

Which one results in
the best cost-performance scenario I'm aiming for?

How do I manage the
storage operationally?

How redundant and distributed is the storage? What
happens if a storage node fails? To what extent can it mitigate my data-loss disaster scenarios?

Which plugin to I use for block storage?

For many new clouds,
the object storage and persistent/block storage are great features that users want. However,
with OpenStack you're not forced to use either if you want a simpler deployment.

Many parts of OpenStack are pluggable, and one of the best examples of this is Block
Storage - which you are able to configure to use storage from a long list of vendors
(Coraid, EMC, GlusterFS, Hitachi, HP, IBM, LVM, NetApp, Nexenta, NFS, RBD, Scality,
SolidFire, Windows Server, Zadara).

Network

If this is the first time you
are deploying a cloud infrastructure in your organization, after reading this section, your
first conversations should be with your networking team. Network usage in a running cloud
is vastly different from traditional network deployments, and has the potential to be
disruptive at both a connectivity and a policy level.

For example, you must plan
the number of IP addresses that you need for both your guest instances as well as management
infrastructure. Additionally, you must research and discuss cloud network connectivity
through proxy servers and firewalls.

One of the first choices you need to make is
between the "legacy" nova-network and OpenStack Networking (aka "neutron"). Nova-network
is a much simpler way to deploy network, but does not have the full software defined
networking features of Neutron, and will be deprecated after 12-18 months.

Object
Storage is very 'chatty' among servers hosting data - even a small cluster does
megabytes/second of traffic, which is predominantly "Do you have the object?"/"Yes
I have the object" Of course, if the answer to the aforementioned question is negative or
times out, replication of the object begins.

Consider the scenario where an entire
server fails, and 24 TB of data needs to be transferred "immediately" to remain at three
copies - this can put significant load on the network.

Another oft forgotten fact is
that when a new file is being uploaded, the proxy server must write out as many streams as
there are replicas - giving a multiple of network traffic. For a 3-replica cluster, 10Gbps
in means 30Gbps out. Combining this with the previous high bandwidth demands of replication
is what results in the recommendation that your private network is of significantly higher
bandwidth than your public need be. Oh, and OpenStack Object Storage communicates
internally with unencrypted, unauthenticated sync for performance - you do want the
private network to be private.

The remaining point on bandwidth is the public facing
portion. Swift-proxy is stateless, which means that you can easily add more and use http
load-balancing methods to share bandwidth and availability between them.

More proxies means more bandwidth, if your storage can keep up.

"Cloud Controller"

To achieve maximum scalability via a
shared-nothing/distributed-everything architecture, OpenStack does not have the
concept of a "cloud controller". Indeed, one of the biggest decisions deployers face is
exactly how to segregate out all of the "central" services - such as the API endpoints,
schedulers, database servers and the message queue.

For best results, acquiring some
metrics on how the cloud will be used is necessary - though of course, with a proper
automated configuration management system it will be possible to scale as operational
experience is gained. Key answers to look for may include:

How many
instances will run at once?

How many compute nodes will run at once?

How many users will access the API?

How many users will access the dashboard?

How many nova-API services do you run at once for your cloud?

How long does a
single instance run?

Does your authentication system also verify externally?

Where choosing the size of compute node hardware is mainly dependent on the types
of virtual machines running, "central service" machines can be more difficult. Contrast
two clouds running 1,000 virtual machines. One is mainly used for long-running websites,
and in the other the average lifetime is more akin to an hour. With so much churn in the
latter, it will certainly need a more heavyset API/database/message queue.

Scaling

Given "scalability" is a key word in OpenStack's mission, it's no surprise that there
are several methods dedicating to assisting the expansion of your cloud by segregating
it - in addition to the natural horizontal scaling of all components.

The first two
are aimed at very large - multi-site - deployments. Compute cells are designed to allow
running the cloud in a distributed fashion without having to use more complicated
technologies, or being invasive to existing nova installations. Hosts in a cloud are
partitioned into groups called cells. Cells are configured in a tree. The top-level cell
("API cell") has a host that runs the API service, but no hypervisors. Each child cell
runs all of the other typical services found in a regular installation, except for the
API service. Each cell has its own message queue and database service, and also runs the
cell service — which manages the communication between the API cell and child cells.

This allows for a single API server being used to control access to multiple cloud
installations. Introducing a second level of scheduling (the cell selection), in addition
to the regular nova-scheduler selection of hosts, provides greater flexibility to control
where virtual machines are run.

Contrast this with regions. Regions have a separate
API endpoint per installation, allowing for a more discrete separation. Users wishing to
run instances across sites have to explicitly select a region. However, the additional
complexity of a running a new service is not required.

Alternately, you can use
availability zones, host aggregates, or both to partition a compute deployment.

Availability zones enable you to arrange OpenStack Compute hosts into logical groups,
and provides a form of physical isolation and redundancy from other availability zones,
such as by using separate power supply or network equipment.

You define the
availability zone in which a specified Compute host resides locally on each server.
An availability zone is commonly used to identify a set of servers that have a common
attribute. For instance, if some of the racks in your data center are on a separate
power source, you can put servers in those racks in their own availability zone. Availability
zones can also help separate different classes of hardware.

When users provision
resources, they can specify from which availability zone they would like their instance
to be built. This allows cloud consumers to ensure that their application resources are
spread across disparate machines to achieve high availability in the event of hardware
failure.

Host aggregates, on the other hand, enable you to partition OpenStack Compute
deployments into logical groups for load balancing and instance distribution. You can use
host aggregates to further partition an availability zone. For example, you might use host
aggregates to partition an availability zone into groups of hosts that either share common
resources, such as storage and network, or have a special property, such as trusted
computing hardware.

A common use of host aggregates is to provide information for
use with the compute scheduler. For example, you might use a host aggregate to group a
set of hosts that share specific images.

In summary

If you've looked at the storage options, determined which types of
storage and how they will be implemented, planned the network carefully (taking into
account the different ways to deploy it and how it will be managed), acquired metrics
to design your cloud controller, then considered how to scale your cluster, then you are
probably now an OpenStack expert. In which case, we'd encourage you to share your findings
with the community!

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.