Category Software Development

I have been working on OpenStack full time for 5 years. During that time I have seen a meteoric rise in OpenStack in terms of adoption and feature parity with OpenStack’s largest competitors: AWS and VMWare. I am running for the OpenStack board of directors as an individual candidate to improve the competitive outcomes for OpenStack.

In my technical work I have been involved in the founding of OpenStack Heat, which was an effort to produce parity with AWS CloudFormation, while also blazing a trail on the incubation track for OpenStack. Without Heat and the excellent engineering team which implemented the first incubation project, OpenStack may have taken longer to become as large and vibrant as it is today.

After Heat, I turned my attention to solving what I perceived as a technical gap in OpenStack: at that time, OpenStack had no functional interoperability with Kubernetes. This led to my involvement in contributing technically to the OpenStack Magnum project by writing a majority of the initial implementation and recruiting a talented core team from across the OpenStack community. Magnum provides technical interoperability by running Kubernetes on top of OpenStack in an OpenStack-native way.

Finally, I started the OpenStack Kolla project to solve OpenStack’s last significant pain point. Operational expenses quantified as the warm blooded people responsible for operating an OpenStack cloud were greater than other competition in the IaaS platform marketplace. OpenStack prior to Kolla required very large teams to maintain. With a Kolla-deployed OpenStack, this problem no longer exists. I passed the baton as the Project Team Lead in November 2016 to a solid leadership team I had developed in the 3 years Kolla was in development.

One thing we tried early on with Kolla was deploying OpenStack on top of Kubernetes 0.9.7; in fact, that was the original goal of Kolla. At the time, Kubernetes was in its early stages of development and could not serve the complex use case scenarios OpenStack presented. Instead, we went through numerous iterations and eventually settled on Ansible as the basis for Kolla. Kolla works so well because Asnbile is such a great dependency choice and the Kolla community stuck with the implementation until we reached critical mass and success.

In March 2016, the core reviewer team of Kolla had made a determination that Kubernetes was mature enough to revisit. As PTL at the time, I didn’t have attention to dedicate full time to growing the community around an OpenStack-on-Kubernetes deployment tool. The core reviewer team started the work with my encouragement. Today, the kolla-kubernetes core reviewer team has implemented a fully functional helm-based microservices layer. We have proven that OpenStack will run on top of Kubernetes, and there now are real, production deployments of the kolla-kubernetes deliverable of the OpenStack Kolla project.

The main factor that makes OpenStack great is the large, diverse set of community members working on the overall system using the Open Source methodology. The properties that makes VMWare and AWS great from the point of view of those respective vendors is vendor lock-in and lower operational expenses. The teams I’ve led in OpenStack have solved various aspects of the lock-in problem, and now with Kolla OpenStack has much lower operational expenses than the competition (AWS and VMWare).

There is no going back in life; instead there is only forward. To this end, the forward-looking strategy I believe is most crucial for OpenStack is competing against these closed source behemoths in cooperation with other Open Source Foundations.

As a result of my belief in this strategy, I am running for the OpenStack board of directors as an individual board member to drive Foundation Interoperability. The first foundation I have a strong desire to build bridges with is the CNCF. CNCF is hosting the Kubernetes, Helm, and other open source projects, of which I personally believe in harmony with OpenStack will lead to the best possible outcome against our closed source competition.

I believe without question that collaboration and foundation interoperability is key to both the success of OpenStack and CNCF. Without excellent working relationships between these two foundations, the worst possible outcome may occur: competition between Open Source Foundations. The world of Open Source should stand together rather than apart. When you vote for the board of director membership, vote for candidates that will deliver on the best outcomes for OpenStack.

After having worked on developing Kolla, an Immutable deployment tool using Ansible to deploy Docker containers containing OpenStack since September 2014, I have come to a clear conclusion there is some confusion about what precisely immutability is and how to best achieve it.

What is this immutable infrastructure thing I keep hearing about? Well first two definitions from google define:

im·mu·ta·ble

i(m)ˈmyo͞odəb(ə)l/

adjective

unchanging over time or unable to be change

and

in·fra·struc·ture

ˈinfrəˌstrək(t)SHər/

noun

the basic physical and organizational structures and facilities (e.g., buildings, roads, and power supplies) needed for the operation of a society or enterprise.

First I’ll dissect immutable. A running container consists at a high level of two things. It consists of a complete filesystem and running application first and foremost. More importantly, and this is where everyone gets hung up around immutability, it includes the application’s configuration options. Some hard-core computer science nerds may think that thee only way to achieve immutability is to pass the configuration options through the environment so from container instantiation until container destruction, the configuration options always remain consistent. This is not the only solution.

Why is immutability important? The reason immutability is desireable is to turn any stateless imperative system into a declarative system. In an imperative system, there are many steps required to achieve a successful (or failed) outcome. By wrapping an imperative system in a container where the configuration never changes, that imperative (read: more complex) system has been turned into declarative system. A declarative system has two outcomes (success or fail) and is always deterministic, meaning it always will have the same outcome after instantiation. I use always with a grain of salt. A cosmic ray could blow up system ram, the hard disk could fail in some way, a kernel bug could trigger an oops on the call path, or an ateroid could hit the Phoenix datacenters! Lets just assume for a moment we throw out these failure scenarios and look to the positive side of things, which is, our infrastructure on which our immutability engine runs will never fail!

Now I’ll dissect infrastructure. Infrastructure is all of the software that goes into making up a system. In the case of Kolla and OpenStack, very little of OpenStack is actually stateless, but immutability still comes to the rescue in many cases. For one, no administrator can muck around with the config options of a running system and crater the environment and have no idea what went wrong. In a properly designed infrastructure, the administrator will configure all options in one place and that configuration will be distributed through the system causing all of the config-option related infrastructure to fail, or all of it to succeed. The container infrastructure of Kolla includes 89 containers, many of which require some form of state, and most of which depend on a database which can result in non-declarative behavior.

Immutability sounds pretty hot huh? The problem is configuring software via environment variables is a huge pain in the ass. Just looking at the reference implementation of docker registry v2.0, significant complexity goes into reading the environment variables without actually altering the contents of the virtual disk. This is really the gold standard for an immutable infrastructure component, but is not the only way to solve the problem.

Remember, we are after a pragmatic declarative system (they why of immutability) not some gold standard where absolutely nothing in the filesystem changes. While a completely unchanged filesystem contents meets the definition of immutability, the spirit of immutability can be met in different ways.

During kola development we have tried pretty much every method to solve this problem. I will enumerate the solutions:

Encode the configuration into the build of the container: This method delivers the immutability similar to docker registry, meaning that nothing on the disk changes, ever. The problem with this approach is any configuration change requires a pokey container rebuild and causes deployment (the config options come from the deployment system) and image building to be mixed, violating separation of concerns.

Encode some environment variables with important information and use crudini to set the on-disk configuration in /etc/service. This delivers near-immutability but trades off complete customization. I say near-immutability because the crudini operation would have to be deterministic for immutability to be preserved, which is hard to guarantee. Encoding the thousands of config options that make up the big tent is hard to manage and if we did that we would want oslo.config to read the config from the environment, not the filesystem. The result is only *some* options end up being added to the environment, the critical ones, limiting configurability.

Create the configuration file that the OpenStack service runs against outside the container. Originally I highly disliked this idea, but I think it was kfox1111 who came to the rescue and suggested “what if you just configure the container one time?” It took me a few days to process that, but what that means is after the container starts, it runs code which host bind-mounts the configuration file, and then configures the container one time. After the container is configured, no further alterations of the configuration are permitted without a redeploy from a central location, meaning arbitrary administrator tinkering won’t damage the deployment. Does this deliver immutability? Absolutely. From container instantiation (which finishes with the configuration options are locked into place) to container destruction, the contents of the disk never change. Immutability preserved, which zero tradeoff – no pokey build on deploy (which can take several hours with v2 registry), still maintain separation of concerns and most importantly Operators maintain complete customization over their environment.

Encode the configuration file generated by the deployment tool into a JSON blob which sets the environment or configuration files appropriately. Then use crudini to set the config options on each boot. This would work but its not very declarative because of the crudini interaction – it was our first attempt, but we found other options to be more viable.

With Kolla we started with #2, briefly tried #4, and finished with technique #3. I’m interested to hear in the comments of this blog post how other people achieved immutability in their infrastructure components without using the onerous environment variable to pass in hundreds of configuration options.

It would be interesting if docker added some type of immutable file loading that was built in and only read the external file(s) (and installed it) the first time the container was run. Alas there is no such thing.

The next step in immutable infrastructure is ensuring a security breakout of the process has limited ability to modify the filesystem. For example, if external software were running as root and could somehow modify files at a whim, it could modify /etc/sudoers and easily escalate privileges to root inside the container. Then there would be a real problem! This same problem can happen on bare metal, but containers insert an extra layer to break through so they actually increase security compared to bare metal. We solved this problem in Kolla by running the containers as regular users and limiting their scope to modify system files which are only owned by the processes UID/GID. While it would be possible for some minimal damage to be done as a non-root user, at-least the immutability of the files not controlled by the process would be preserved.

I’ve wrote about what immutability is, and a little about why you would want immutability. Besides the warping of a process from imperative to declarative, there are benefits that trickle from that reality.

An operator has to try hard (using docker exec) to modify the contents of the filesystem. Immutability protects vendors from cowboy coders like myself – since someone that goes around and mucks with the internals of a container is unlikely to call technical support.

A technical support agent doesn’t have to guess what software has been installed on the system which may cause conflicts with that vendor’s software. An immutable deployment target should only have immutable software on it.

Upgrades and downgrades work flawlessly since the full state of the system including its configuration is recorded in the containers.

Immutable infrastructure will change the world. We are just in the beginning phase of the conversion to immutability. Companies are kicking the tires on their favorite immutability engine (mine is Docker). The immutability software is a bit green. Still, I feel completely comfortable deploying OpenStack in n-way active H/A mode using Kolla using docker as our immutability engine. Kolla doesn’t use any complex features of Docker; everything we use has been in use for a year or more in the field.

I hope folks find this blog post helpful in your journey towards implementing immutable datacenter. That is the next big thing in computing, and will take years to achieve

Kolla is an opinionated OpenStack deployment system unless the operator has opinions! Kolla is completely customizable but comes with consumable out of the box defaults for use with Ansible deployment. To understand the Kolla community’s philosophy towards deployment, please read our customize deployment documentation.

Kolla includes the following features:

AIO and multinode deployment using Ansible with n-way active high availability.

The following services can be deployed via Ansible in 12 to 15 minutes with 3 node high availability:

ceph for glance, nova, cinder

cinder (only ceph is implemented as a backend at this time)

glance

haproxy

heat

horizon

ironic (tech preview)

keystone

mariadb with galera replication

memcached

murano

neutron

nova

rabbitmq

swift

Kolla’s implementation is stable and the core reviewers feel Kolla is ready for evaluation by operators and third party projects. We strongly encourage people to evaluate the included Ansible deployment tooling and are keen for additional feedback.

In the Kolla project, we were heavily using host bind mounts to share filesystem data with different containers. A host bind mount is an operation where a host directory, such as /var/lib/mysql is mounted directly into the container at some specific location.

This pulls and starts the kollaglue/centos-rdo-mariadb-app container and bind mounts /var/lib/mysql from the host into the container at the same location. This allows all containers to share the host’s /var/lib/mysql that are started with this bindmount.

Through months of trial and error, we found bind mounting host directories to be highly suboptimal.

Containers exhibit three magic properties.

Containers are declarative in nature. A container either starts or fails to start, and should do so consistently. Even though containers typically run imperative code, the imperative nature is abstracted behind a declarative model. So it is possible that an imperative change in the how the container starts could remove this spectacular property. If the service relies on a database, or data stored on the filesystem, the system becomes non-deterministic. Determinism is a major advantage of declarative programming.

Containers are immutable. The contents, once created can not be modified except by the container software itself. It is almost like composing an entire distribution including compilers and library runtimes as one binary to be run.

Containers should be idempotent. A container should be able to be re-run consistently without failing if it started correctly the first time.

Using a host bind mount weakens or destroys the three magic properties of containers. Docker, Inc. is intuitively aware this was a problem so they implemented docker data volume containers. A docker data container is is a container that is started once and creates a docker volume. A docker volume is permanent persistent storage created by the VOLUME operation in a Dockerfie or the –volume command. Once the data container is created, it’s docker volume is always available to other docker containers using the volumes-from operation.

The following operation starts a data container based upon the centos image, creates a data volume called /var/lib/myql, and finally runs /bin/true which exits quickly:

docker run -d --name=mariadb_data --volume=/var/lib/mysql centos true

Next the container ID must be retrieved to start the application container:

When using data volume containers, all the correct permissions are sorted out by docker automatically. Data is shared between containers. Most importantly it is more difficult to modify the container’s volume contents from outside the container. All of these benefits help preserve the declarative, immutable, and idempotent properties of containers.

We also use data containers for nova-compute in Kolla. We still continue to use bind mounts in some circumstances. For example, nova-api needs to run modprobe to load kernel modules. To support that we allow bind mounting of /var/lib/modules:/var/lib/modules with the :ro (read only) flag.

We also continue to have some container-writeable bind mounts. The nova-libvirt container requires /sys/fs/cgroups:/sys/fs/cgroups to be bind mounted. Some types of super privileged containers cannot get away from bind mounts, but most of the Kolla system now runs without them.

OpenStack Ironic is a bare metal as a service deployment tool. Fedora Atomic is a µOS consisting of a very minimal installation of Linux, kernel.org, Kubernetes and Docker. Kubernetes is an endpoint manager and container scheduler, while Docker is a container manager. The basic premise of Fedora Atomic using Ironic is to present a lightweight launching mechanism for OpenStack.

The first step in launching Atomic is to make Ironic operational. I used devstack for my deployment. The Ironic developer documentation is actually quite good for a recently Integrated OpenStack project. I followed the instructions for devstack. I used pxe+ssh, rather then the agent+ssh. The pxe+ssh driver virtualizes bare-metal deployment for testing purposes, so only one machine is needed. The machine should have 16GB+ of RAM. I find 16GB a bit tight, however.

I found it necessary to hack devstack a bit to get Ironic to operate. The root cause of the issue is that libvirt can’t write the console log to the home directory as specified in the localrc. To solve the problem I just hacked devstack to write the log files to /tmp. I am sure there is a more elegant way to solve this problem.

It took me two days to sort out the project in this blog post, and during the process, I learned a whole lot about how Ironic operates by code inspection and debugging. I couldn’t find much documentation about the deployment process so I thought I’d share a nugget of information about the deployment process:

Nova contacts Ironic to allocate an Ironic node providing the image to boot

Ironic pulls the image from glance and stores it on the local hard disk

Ironic changes the PXEboot configuration to point to the user’s actual desired ramdisk and kernel

The deploy node reboots into SEABIOS again

The node boots the proper ramdisk and kernel, which load the disk image that was written via iSCSI

Fedora Atomic does not ship images that are suitable for use with the Ironic model. Specifically what is needed is a LiveOS image, a ramdisk, and a kernel. The LiveOS image that Fedora Cloud does ship is not the Atomic version. Clearly it is early days for Atomic and I expect these requirements will be met as time passes.

But I wanted to deploy Atomic now on Ironic, so I sorted out making a PXE-bootable Atomic Live OS image.

The Atomic cloud image has /dev/sda1 containing the contents of the /boot directory. The /dev/sda2 partition contains a LVM partition. There is a logical volume called atomicos/root which contains the root filesystem.

Building the Fedora Atomic images for Ironic is as simple as extracting the ramdisk and kernel from /dev/sda1 and extracting /dev/sda2 into an image for Ironic to dd to the iSCSI target. A bit complicating is that the fstab must have the /boot entry removed. Determining how to do this was a bit of a challenge, but I wrote a script to automate the Ironic image generation process.

The first step is to test that Ironic actually installs via devstack using the above localrc:

[sdake@bigiron devstack]$ ./stack.sh
bunch of output from devstack ommitted
Keystone is serving at http://192.168.1.124:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.124

Next, take a look at the default image list which should look something like:

In this case, we want to boot the UEC image. Ironic expects properties attached to the image ramdisk_id and kernel_id which are the UUIDs of cirros-0.3.2-x86_64-uec-kernel and cirros-0.3.2-x86_64-uec-ramdisk.

Next we configure Ironic’s PXE boot config options and restart the ironic conductor in devstack. To restart Ironic conductor use screen -r, find the appropriate conductor screen, press CTRL-C, up arrow, ENTER. This will reload the configuration.

I found determining how to create the images from the Fedora Atomic Cloud images a bit tedious. The diskimage builder tool would likely make this easier, if it supported RPM-ostree and Atomic.

Ironic needs some work to allow the pxe options to override the “root” initrd parameter. Ideally a glance image property would be allowed to be specified to override and extend the boot options. I’ve filed an Ironic blueprint for such an improvement.

Turbocharging DevStack

I wanted to turbocharge my development cycle of OpenStack running on Fedora 18 so I could be waiting on my brain rather then waiting on my workstation. I decided to purchase two modern solid state drives (SSD) and run them in RAID 0. I chose two Intel S3500 160 GB Enterprise grade SSDs to run in RAID 0. My second choice was the Samsung 840 Pro which may have been a bit faster, but perhaps not as reliable.

Since OpenStack and DevStack mostly use /var and /opt for their work, I decided to replace only /var and /opt. If a SSD fails, I am less likely to lose my home directory which may contain some work in progress because of the lower availability of RAID 0.

The Baseline HP Z820

For a baseline my system is a Hewlett Packard Z820 workstation (model #B2C08UT#ABA) that I purchased from Provantage in January 2013. Most of the computer is a beast sporting an 8 core Intel Xeon 35-2670 @ 2.60GHZ running with Hyperthreading for 16 total cpus, Intel C602 chipset, and 16 GB Quad Channel DDR3 ECC Unbuffered RAM.

Many people over the past year have asked me how exactly to use CloudInit while the Heat developers have implemented OpenStack Heat. Since CloudInit is the default virtual machine bootstrapping system on Debian, Fedora, Red Hat Enterprise Linux, Ubuntu and likely more distros, we decided to start with CloudInit as our base bootstrapping system. I’ll present a code walk-through of how we use CloudInit inside OpenStack Heat.

Reading the CloudInit documentation is helpful, but it lacks programming examples of how to develop software to inject data into virtual machines using CloudInit. The OpenStack Heat project implements injection in Python for CloudInit-enabled virtual machines. Injection occurs by passing information to the virtual machine that is decoded by CloudInit.

IaaS paltforms require a method for users to pass data into the virtual machine. OpenStack provides a metadata server which is co-located with the rest of the OpenStack infrastructure When the virtual machine is booted, it can then make a HTTP request to a specific URI and return the user data passed to the instance during instance creation.

CloudInit’s job is to contact the metadata server and bootstrap the virtual machine with desired configurations. In OpenStack Heat, we do this with three specific files.

This file directs CloudInit to turn off SELinux, install ssh keys for the user ec2-user, setup the locale, hostname, ssh, timezone, modify /etc/hosts with correct information and output the results of all cloud-init data to /var/log/cloud-init-output.log

There are many cloud config modules which provide different functionality. Unfortunately they are not well documented, so the source must be read to understand their behavior. For a list of cloud config modules, check the upstream repo.

The part-handler.py file is executed by CloudInit to separate the UserData provided by the MetaData server in OpenStack. CloudInit executes handle_part() for each part of a multi-part mime message which CloudInit doesn’t know how to decode. This is how OpenStack Heat passes unique information for each virtual machine to assist in the orchestration process. The first ctype is always set to __begin__. which triggers handle_part() to create the directory /var/lib/heat-cfntools.

The OpenStack Heat instance launch code uses the mime type of x-cfninitdata In OpenStack Heat. OpenStack Heat passes several files via this mime subtype each of which is decoded and stored in /var/lib/heat-cfntools.

part-handler.py writes the contents of each x-cfninitdata mime subpart to /var/lib/heat-cfntools on the instance

CloudInit executes part-handler.py with __end__

CloudInit executes the configuration operations defined by the config file

CloudInit runs the x-shellscript blob which in this case is loguserdata.py

loguserdata.py logs the output of /var/lib/heat-cfn/cfnuserdata which is the initialization script set in the OpenStack Heat templates

This code walk-through will help developers understand how OpenStack Heat integrates with CloudInit and provide a better understanding of how to use CloudInit in your own Python applications if you roll your own bootstrapping process.

Over the last year, Angus Salkeld and I have been developing a IAAS high availability service called Pacemaker Cloud. We learned that the problem we were really solving was orchestration. Another dev group was also looking at this problem inside Red Hat from the launching side. We decided to take two weeks off from our existing work and see if we could join together to create a proof of concept implementation from scratch of AWS CloudFormation for OpenStack. The result of that work was a proof of concept project which provided launching of a WordPress template, as had been done in our previous project.

The developers decided to take another couple weeks to determine if we could get a more functional system that would handle composite virtual machines. Today, we released that version, our second iteration of the Heat API. Since we have many more developers, and a project that exceeded our previous functionality of Pacemaker Cloud, the Heat Development Community has decided to cease work on our previous orchestration projects and focus our efforts on Heat.

A bit about Heat: The Heat API implements the AWS Cloud Formations API. This API provides a rest interface for creating composite VMs called Stacks from template files. The goal of the software is to be able to accurately launch AWS CloudFormation Stacks on OpenStack. We will also enable good quality high availability based upon the technologies we created in Pacemaker Cloud including escalation.

Given that C was a poor choice of implementation language for making REST based cloud services, Heat is implemented in Python which is fantastic for REST services. The Heat API also follows OpenStack design principles. Our initial design after our POC shows the basics of our architecture and our quickstart guide can be used with our second iteration release.

A mailing list is available for developer and user discussion. We track milestones and issues using github’s issue tracker. Things are moving fast – come join our project on github or chat with the devs on #heat on freenode!

A few short weeks after Corosync 1.0.0 was released, the developers huddled for our future planning of Corosync 2.0.0. The major focus of that meeting was “Corosync as implemented is too complicated”. We had threads, semaphores, mutexes, an entire protocol, plugins, a bunch of unused services, a backwards compatability layer, multiple cryptographic engines.

Going for us, we did have a world class group communication system implementation (if not a little complicated) developed by a large community of developers, battle hardened by thousands of field deployments, tested by tens of thousands of community members.

As a result of that meeting, we decided to keep the good and throw out the bad, as we did between the openais and corosync transitions. Gone are threads. Gone are compatibility layers. Gone are plugins. Gone are unsupported encryption engines. Gone are a bunch of other user-invisible junk that was crudding up the code base.

Shortly after Corosync 2.0.0 development was started, Angus Salkeld had the great idea of taking the infrastructure in corosync (IPC, Logging, Timers, Poll loop, shared memory, etc) and putting that into a new project called libqb. The objective of this work was obvious: To create a world-class infrastructure library specifically focused on the needs of cluster developers with a great built-in make-check test suite.

This helped us reach even closer to our goals of simplification. As we pushed the infrastructure out of base Corosync, we could focus more on protocols/APIs. You would be surprised to find that implementing the infrastructure took about as much effort as the rest of the system (APIs and Totem).

All of this herculean effort wouldn’t be possible without our developer and user community. I’d especially like to acknowledge Jan Friesse in his leadership role of helping to coordinate the upstream release process and drive the upstream feature set to 2.0.0 resolution. Angus Salkeld was invaluable in his huge libqb effort which occurred on time and with great quality. Finally I want to thank Fabio Di Nitto for beating various parts of the Corosync code base into submission and his special role in designing the votequorum API. There are many other contributors including developers and tested who I won’t mention individually, but I’d also like to thank for their improvements to the code base.

Great job devs!! Now its up to the users of Corosync to tell us if we delivered on our objective we set out with 18 months ago – making Corosync 2.0 faster, simpler, smaller, and most importantly higher quality.

The software can be downloaded from Corosync’s Website. Corosync 2.0, as well as the rest of the improved community developed cluster stack will show up in distros as they refresh their stacks.

Pádraig Brady will be providing a live demonstration of Pacemaker Cloud integrated with OpenStack at FOSDEM.

What is pacemaker cloud?

Pacemaker Cloud is a high scale high availability system for virtual machine and cloud environments. Pacemaker Cloud uses the techniques of fault detection, fault isolation, recovery and notification to provide a full high availability solution tailored to cloud environments.

Pacemaker Cloud combines multiple virtual machines (called assemblies) into one application group (called a deployable). The deployable is then managed to maintain an active and running state in in the face of failures. Recovery escalation is used to recover from repetitive failures and drive the deployable to a known good working state.