Introducing Linch-Pin: Hybrid cloud provisioning using Ansible

Wed, Oct 19, 2016

Background

Over the past 6+ months, I've been working at Red Hat. During this time, I've been working mostly with Continuous Integration projects and the like. Recently, I joined a new team, called Continouous Infrastructure and started working on automating use cases around Project Atomic and OpenShift.

As part of that project, we have an internal tool, called ci-factory. It has within its components, a provisioner that works with OpenStack, Beaker, etc. However, it doesn't support a broader set of clouds/infrastructure, including Amazon AWS, Google Compute Engine, Libvirt, etc. Additionally, ci-factory provided some tooling for ansible dynamic inventories. However, the configurations that generated this were not very flexible, and sometimes outright incorrect.

Enter Provisioner 2.0 (Linch-Pin)

Beginning in June this year, our team started creating a new tool. Lead by developer extraordinaire, Samvaran Rallabandi (we call him SK), we now have Linch-Pin.

The concept of Linch-Pin was to retool the basic ci-factory provisioner into something much more flexible. While provisioning was important, ci-factory was written in a mix of python and bash scripts. Provisioner 2.0 is written completely in Ansible. Because Ansible is excellent at both configuration management and orchestration of systems, it could be used to both provision and configure systems. Ansible can handle complex cluster configurations (eg openshift-ansible) as well as much simpler tasks, like adding users to a system idempotently.

Additional, considerations for Provisioner 2.0 would allow leveraging of existing Ansible cloud modules, reducing the amount of code needing to be written. So far, this has proven very valuable and made development time much shorter overall. There are, however certain modules, Libvirt for example, that seem poorly implemented in Ansible. Thus, an updated module will need to be written.

Lastly, Provisioner 2.0 should exist upstream. Linch-Pin has been an upstream project since the first working code was created. This was done to encourage contribution from inside and outside of Red Hat. We believe that many projects will be able to take advantage of Linch-Pin and contribute back to the project as a whole. Many other upstream and downstream projects have expressed interest in Linch-Pin just from a basic demonstration.

Linch-Pin Architecture

Linch-Pin has some basic components that make it work. First we'll cover the file structure, then dive into the core bits:

provision/
├── group_vars/
├── hosts
├── roles/
└── site.yml

At one point in time, Linch-Pin was going to have both provision and configure playbooks. The provision components became Linch-Pin, while the configure components became external repositories of useful components which leverage Linch-Pin in some way. A couple of examples are the CentOS PaaS SIG's paas-sig-ci project, and the cinch project, by, Red Hat Quality Engineering.

The roles path is the meat of Linch-Pin. All of the power that makes hybrid cloud provisioning possible exists here. Ansible defines things called playbooks to drive these roles. The site.yml is the playbook itself, which start the execution.

Other paths exist:

├── outputs/
├── filter_plugins/
├── InventoryFilters/
└── library/

We will cover these components later on in this post, or later in the series.

Topologies

To consider complex cloud infrastructure, a topology definition can help. The topology definition is created using YAML. Before Linch-Pin provisions anything, the topology must be validated using a predefined schema. Schemas can be created to change the way the topology works if desired. However, there is a default schema already defined for simplicity.

After validation, the topology is then used to provision resources. A topology is broken into its resource components, and each is delegated to the appropriate resource provisioner. This is generally done asynchronously, meaning that nodes on different cloud providers can be provisioned at the same time using appropriate credentials. Assuming a successful provisioning event per provider, the resource provisoner(s) will return appropriate response data to the topology provisioner.

As mentioned above, credentials may be required for some cloud providers. This is handled by the credentials manager. The credentials details are stored in the topology definition as a reference to the vault/location of said credentials. The resource provisioner uses these to authenticate to the appropriate cloud provider as necessary.

This topology describes a single set of instances, which will run on the local ci-osp site. There will be 3 nodes, given network devices on the 'atomic-e2e-jenkins' network. Each instance will be of type 'm1.small'. Keep in mind, the openstack server must actually be configured and accept the above options. This is out of scope of this post, however.

Topology terminology explained:

topology_name: a reference point which Linch-Pin uses to track the nodes, networks, and other resources throughout provisioning and teardown

resource_groups: a set of resource definitions. One or many groups of nodes, storage, etc.

res_group_type: a predefined set of group types (eg. openstack, aws, gcloud, libvirt, etc.)

res_defs: A definition of a resource with its component attributes (flavor, image, count, region, etc.)

As mentioned above, the topology describes the resources needed. When Linch-Pin is invoked, this file will be read and create the described systems. More information about topologies and structures is described in the Linch-Pin documentation. More complex examples can be found in the Linch-Pin github repository.

Provisioning

To provision the openstack-3node-cluster.yml, Linch-Pin currently uses the ansible-playbook command. There are many options available that can be passed as --extra-vars, but here, we only show two: state and topology. Simply calling the provision playbook will provision resources:

The diagram shows this process, the topology definition is provided to the provisioner, which then provisions the requested resources. Once provisioned, all cloud data is gathered and stored.

A great many things happen when this playbook is run. Let's have a look at the process in a bit more detail.

Determining Defaults

Linch-Pin first discovers the needed configurations. Either from the --extra-vars as shown above, or from the linchpin_config.yml. This determines the schema, topology, and some paths for outputs and the like (covered later in this post).

Schema Check

Once everything is configured properly, a schema check is performed. This process is used to ensure the topology file matches up with the defined constraints. For example, there are specific resource types (res_type), like os_server, aws_ec2, and gcloud_gce. Others, like libvirt, beaker, and docker are not yet implemented. The schema check ensures that further processing only occurs for resource types that are currently supported.

Provisioning Nodes

Once the schema check passes, the topology is provisioned with the cloud provider(s). In the example, there is only one, openstack, but there could be several clouds provisioned at once. The provisioner plugin is called for each cloud provider, credentials are passed along as needed. If all is successful, nodes will be provisioned according to the topology definition.

Determining Credentials

It may not have been clear above, but when provisioning nodes from certain cloud providers, credentials are required. In the topology definition file, there is one line that indicates credentials:

assoc_creds: "openstack_creds"

However, this doesn't really tell us anything. It turns out, each provisioner plugin has an ansible role. Each role contains some variables for determining how to connect to the cloud provider. For instance, with openstack, the roles/openstack/vars/openstack_creds.yml relates to our topology definition:

The openstack credentials currently come from environment variables. Simply export these variable to the shell and they will be picked up properly. From this point, openstack will grant access according to its policies. The environmental variables are used this way, where appropriate, for all cloud providers.

This process requires knowledge of the outputs, as mentioned previously. Outputs are tracked by a few variables in the linchpin_config.yml. Specifically, the outputfolder_path provides the location of the output, along with the filename, which is based upon the topology_name.

Consider the following; outputfolder_path=/tmp/outputs, and topology_name=openstack_3node_cluster. From this, the output of provisioning would reside at /tmp/outputs/openstack_3node_cluser.yml.

Because the linchpin_config.yml contains the path to the output file, it is then parsed and used to teardown the resources. In this case, the single openstack node listed and its networking resources are torn down. If there were more nodes, more data would exist. Similarly, if there were additional clouds, the data would be populated for the appropriate output fields.

Conclusion

Finally! We've made it through the introduction to Linch-Pin. As you can see, running Linch-Pin is pretty easy, but there's a lot to it internally. Now it's time to use the provisioner, so go ahead and try it out.