Like this:

This time I would like to talk about OpenStack unit tests we were working on at Grid Dynamics.

First and furthermost, writing unit tests for OpenStack is a tar pit: if you have started, you cannot resist. The task is really difficult by itself; moreover, creating tests for such quickly developed project is like a life on a volcano.

Our task seemed to be quite trivial. We have a set of projects with different test coverage. We have to cover them at least for 80%. Python coverage utility is at our service.

I have introduced the following infrastructure. We publish our patches on Gerrit that replicates its repositories to another server for easy recovering. Jenkins monitors Gerrit and recalculates coverage after every patch publication. Coverage for each project is gathered to table.

I hoped that coverage calculation is not a big deal, but I was too optimistic. Imagine that we simply use run_tests.sh script that is available for almost all the projects. Than we will face some problems.

One does not simply use run_tests.sh. It installs a fresh virtual environment for Python packages using corresponding pip-requires and test-requires files. This environment lacks some necessary packages (e.g., iso8601) and its eventlet is incompatible with Python 2.6 that we use in CentOS. So, I had to prepare a virtual environment with a custom script. So does OpenStack Jenkins.

It usually takes a lot of time to download and compile all Python dependencies. It was too long even for a Gentoo user, so I cached the environment using a common env for all the packages.

pip is not a reliable utility. During package installation, it is used to erasing all package data except for metainformation. The last point means that the package is considered to be installed while its files are gone, so, you have to uninstall it and install again.

Well, now we have a shiny virtual environment. It’s time to run tests and calculate coverage. Currently, OpenStack unit tests for Nova, Glance, Quantum, and Keystone are slow and run about 10-20 minutes on our testing server. There is a good idea to migrate for nosetests to testr: this will allow to run the tests in parallel (the thing is that nosetests parallelization is incompatible with eventlet/greenlet widely used in OpenStack). Nova already uses testr, but updating other packages is a laborious task. Some tests depend on each other and cannot be run in parallel, here is an example of one such problem I’ve detected and fixed: https://bugs.launchpad.net/quantum/+bug/1125951.

I have used incremental testing to increase the speed. I cache .coverage file that accumulates statistics for previous testing and run only those tests that were added or updated. The latter are determined using git diff --name-only command. Also, incremental testing solves another problem: summarize coverage for all commits for given project. These commits are siblings on commit tree, so, you cannot checkout to the latest one and run all published tests that are not accepted yet.

OpenStack includes the Oslo project – a set of python libraries shared by other projects. Unfortunately, Oslo is not a true library now: its code it copied with a special script to other projects. Thus our coverage statistics would be incorrect if we include uncovered Oslo code to common report. So, I had to implement a blueprint to fix this issue.

All preparations are done, now we are ready for testing!

Our testing process differs from one used in OpenStack Jenkins. It happens that upstream tests fail just because they were not run during verification, but sometimes they pass somehow in OpenStack jobs and fail in my environment. Here is an example: Quantum’s test_policy passes in https://review.openstack.org/#/c/21091/, but they should fail because one necessary configuration parameter is not imported properly (my fix: https://review.openstack.org/#/c/21205/).

So, now OpenStack community has a new team that looks at unit tests from slightly different point of view detecting problems and fixing them.

Like this:

OpenStack client packages have a long history. It begins in November 2009 when Rackspace Cloud Servers package started. It provided a Python API (the cloudservers module) and a command-line script (cloudservers). Initially, the script was just a stub, but it became a useful CLI utility able to launch, stop, and resize virtual machines.

cloudservers package introduced a library architecture that is used till now. All entities can be split into five groups.

Resources, e.g., a flavor, a server, or an image. Technically, a resource is a Python object and its class is a descendant of the Resource class.

Managers – they provide operations on resources, for example, “list all flavors” or “delete an image”. So, we have a flavor manager, an image manager, and so on. As you may guess, manager’s classes are descendants of the Manager class.

HTTP client provides a convenient interface for managers that send HTTP requests to the server. The HTTP client is also responsible for authentication process that changed a lot after introducing a new Keystone service, so, the newer HTTP clients are a bit more complicated.

Exceptions are normal Python exceptions raised by HTTP client for HTTP error codes. This is a more or less rich hierarchy with exceptions such as Unauthorized, BadRequest, or NotFound.

Client (not to be confused with HTTP client!) puts HTTP client and various managers together (using class composition: HTTP client and managers are members of a client). As a user, you create a client and can immediately perform any API calls:

# this is the client
client = Client(USERNAME, PASSWORD, PROJECT_ID, AUTH_URL)
# client.flavors is a manager
all_flavors = client.flavors.list()
# and all_flavors is a list of resources
print all_flavors

The oldest of currently alive clients (novaclient) was born in January 2011 as a fork of Rackspace Cloud Servers package. Since that time, cloudservers library and CLI script were renamed to novaclient. They support new Nova API that was growing all the time, but these two main functions (a Python API and a command line client) remain unchanged.

About a year later, a new OpenStack client project called keystoneclient was started. It was a flesh of novaclient’s flesh with almost the same architecture with a small difference: client was a child class of HTTP client thus using inheritance, not composition. And, of course, keystoneclient has its own managers and resources (tenant, user, etc.).

Al lot of code required in the new client package was already written in novaclient (the base Resource and Manager and HTTP client). But this code was copied, not imported in keystoneclient. On the one hand, it made these packages independent: you haven’t to install novaclient if you would like to use keystoneclient. On the other hand, the story of duplicated classes diverged and they gained different features available in one package and absent in another.

glanceclient used the same copying approach with the same benefits and pitfalls. However, quantumclient and swiftclient are completely different and I won’t discuss them here.

So, what do we have now?

Keystone server provides tokens with limited time to live. So, that’s natural to obtain an “Unauthorized” error after a series of successful calls. Nova and Keystone clients handle this situation correctly: they do one call to obtain a new fresh token and repeat the fallen query. glanceclient just raises an exception.

Keystone server supports two ways of authentication on a tenant: with user name and password and with an unscoped token. As a response to successful authentication, it returns a scoped token and a catalog of all OpenStack service endpoints (nova, glance, keystone, swift etc.). So, keystoneclient supports both authentication ways while novaclient handles only authentication with user name and password. glanceclient is even less prompt: it requires a scoped token and a Glance server endpoint. It knows nothing about clever Keystone service and you have to do the dirty job. By the way, glanceclient’s shell uses keystoneclient to issue this initial call to Keystone.

All client constructors use different parameters. For example, the thing that is called password in keystoneclient is an api_key in novaclient for historical reasons: it was called apikey (without underscore!) in cloudservices three years ago.

clients have not only different constructors but also diverse behavior: keystoneclient authenticates immediately when you create the client object while novaclient does it lazily during first API call.

Often you would like to make calls to different services. A dashboard or a common command line tool usually requests tenant list from Keystone, image list from Glance, and sends a “launch an instance” command to Nova. With current clients it’s different to share the same token and service endpoint catalog. A simple ways could be just to use a common HTTP client object, but it’s impossible because of incompatible architectures in different client packages.

To solve these problems, we could move the common code to a separate library that would be imported in all three clients. The common library would contain:

the base Resource class;

the base Manager class;

a rich Exceptions hierarchy;

a feature rich HTTP client that supports all ways of authentication, handles outdated token faults, and saves the whole service catalog returned by Keystone;

the base client class that contains an instance of HTTP client as a member: this way, several clients (e.g., a client for Nova and a client for Keystone) can share the same HTTP client.

I developed a sample implementation of this library and called it python-openstackclient-base. The library is used in Altai Private Cloud, a project of Grid Dynamics. python-openstackclient-base is easy to use:

Now I’m going to put this library to oslo-incubator project and use it in all three clients. When oslo-incubator will be mature, it will be imported in OpenStack projects as I want, but now its code will be just copied literally to other projects. However, it’s also quite satisfiable since it will reach all the goals mentioned above.

Like this:

OpenStack is very good at launching virtual machines – that’s its purpose, isn’t it? But usually you want to monitor state of you machines somehow, and there are many reasonable ways.

You can test daemons running on the machine, e.g., check up open ports or poll known services. Of course, this approach means that you know exactly what services should be running – and this is the most precise way to test system health.

You can ask hypervisor if the machine is ok. That’s a very dirty check since hypervisor will likely report that VM is active while its operating system kernel can encounter problems.

A compromise settlement may be pinging the machine. It’s a general solution since a lot of VMs respond to ping normally. Sure, VM can ignore ping or its daemons can have problems while host is responding to ping, but this solution is far easier to implement then check each machine according to an individual plan.

Let’s concentrate on the last two approaches. I would like to launch a machine and check it.

fping API is simple and straightforward. We can ask to check all instances or a single one. In fact, we have two API calls.

GET /os-fping/<uuid> – check a single instance.

GET /os-fping?[all_tenants=1]&[include=uuid[,uuid...][&exclude=...] – check all VMs in the current project. If all_tenants is requested, data for all projects is returned (by default, this option is allowed only for admins). include and exclude are parameters specifying VM masks. These parameters are mutually exclusive and exclude is ignored if include is specified. Consider that VM list is VM_all, then if include list is set, the only VM_all * VM_to_include (set intersection) will be tested – thus we can check several instances in a single API call. If exclude list is provided, VM_all -
VM_to_exclude (set difference) will be polled – thus we can skip testing for instances that are not supposed to respond to ping.

fping increases I/O load on nova-api node, so, by default, fping API is limited to 12 calls in an hour (nevertheless it’s a single or several instances poll).

Like this:

A new version of Altai Private Cloud for Developers 1.0.2 was released.

The new release is devoted to cleaning package dependencies. Also, a bunch of bugfixes was made, primarily to user interface. Let’s see what’s new in Altai 1.0.2 from maintainer’s point of view.

In previous releases, we had this motto: “Take basic CentOS/RHEL, take our source RPMs, and you will be able to build the whole Altai and install it”.
Altai RPMs (both source and binary) were grouped in two repositories: “main” and “deps”. “deps” were packages rebuilt from their third-party source RPMs without changes. All other packages went to “main”, including customized third-party software (like nginx with uploading module) and Altai proper packages like Focus web UI.
Since we built both “main” and “deps” packages, we signed them with Grid Dynamics signature.

This model had one pitfall: we had to maintain plenty of well-known packages that were not included into basic CentOS/RHEL, such as Rabbit MQ or Erlang. That made our repositories really tremendous: 500 MiB, 100 MiB for “main” and 400 for “deps”! Imagine how wasteful is add these tons of unchanged third-party packages to every release. That’s why we tried the following solution in the previous release (1.0.1): include a chain of repositories so almost all unchanged packaged are downloaded from 1.0.0 release and 1.0.1 repository contains only packages to upgrade. As it was shown in this article, YUM can handle thousands of repositories simultaneously without performance problems. So, the repository chain approach significantly saves space for newer releases, but it leads to some maintaining problems.

For example, imagine if one package should be downgraded in the next release. We can call yum downgrade package-name in Altai installer, but how could we guarantee that this packages will not be updated later accidentally by user in a yum update procedure?

A more complex problem is that it’s difficult to determine a list of all packages that belong to given release if they are spread between lots of repositories. Even more, build a new release repository being the next in the repo chain is a nontrivial task.

Fortunately, if you decide to use EPEL packages, you’ll say farewell to all these obstacles. First, the repository becomes significantly smaller just because now you haven’t to rebuild heaps of packages. Now we have only 160 MiB of binary packages. Second, with a small repository you haven’t to use cunning repository chain – everything becomes transparent and easy to support.

It’s worth to say that it using EPEL packages isn’t as simple as it seems to be. Some important Python libraries are installed in such places that you would have to patch your programs or they wouldn’t find their dependencies. We decided to reject these libraries and package them ourselves. Luckily, the most of EPEL packages were able to be used in Altai without complications.

As far as we reviewed all Altai packages, we chosen another repository layout. Let’s briefly describe it.

centos6: these packages are maintained and developed by Grid Dynamics team. This group contains customized OpenStack and a lot of home-grown packages signed by Altai team. Sources of these packages are available at GitHub.

deps: these packages are not a subject of Grid Dynamics development. This category includes the following subdirectories.

As you see, we still provide sources of all packages we’ve built as it’s appropriate for an open source project.

As it was mentioned above, we keep Altai sources in git. There are two steps between a git repository and a binary RPM. First, a source RPM must be built from a git repo. Second, a binary RPM is built from a source one.

Each step is a not-trivial operation. A source RPM must contain all information required for package build, including source tarball, spec file, and possibly patches that should be applied to unpacked tarball before build. ALT Linux team even developed a powerful toolkit called GEAR (Get Every Archive from git package Repository). GEAR contains tens of individual CLI programs for different purposes, including composing a source RPM from a git repository and importing a tarball to git. We used GEAR in previous releases, but the only feature we needed was git-to-source RPM conversion. Even more, almost every conversion was trivial because we develop software keeping in mind that they will be packaged to RPMs. GEAR, on its turn, allows to maintain third-party software that is under active or slow development and need to be patched before packaging.

Obviously, multifunctional GEAR led to boilerplate configuration files. That’s why we simplified git-to-source RPM conversion as in our case it could be done with a small and clear script. And there is now need to write GEAR rules file: it’s sufficient to just place a spec file to git repository.

Frankly speaking, the second step (source-to-binary RPM conversion) is trickier than the first, but, fortunately, there is a ready solution – mock tool used in Fedora and EPEL. mock prepares a clean and safe build chroot environment for build operation. We have already used mock for previous releases and we haven’t ceased to take its advantages.

So, Altai 1.0.2 is easy to develop, maintain, and support and in the same time more foolproof.

Like this:

I am working on a build system that makes easy to control several connected git repositories forming one project. This system is being written in bash and uses lots of rarely used git and bash features.

Often I have to iterate over a table usually generated by git, e.g., to see the changes between a commit and its parent, I run:

Share this:

Like this:

We use Openstack at Grid Dynamics for more than a year. It is the basis of our private infrastructure originally named Cloud For Grid Dynamics (C4GD) and now known as Altai. C4GD provided cheap and fast VM management for our developers’ needs with reliable support. We were using the Diablo release and were happy with it.

On 5 April 2012 a new shiny Essex was released not without Grid Dynamics initiative (you can even find me in the list of contributors). I was challenged to investigate and prepare migration scripts for our cloud.

I started from my old scripts for installing Diablo and began writing a set of tools that make easy both migration and installation from scratch for different releases. You can see the result of my work at Github. These scripts work with OpenStack packaged to RPMs at Grid Dynamics.

Like this:

When you develop a software system, you can choose any solution between two extreme approaches.

Build and maintain all your dependencies.

Rely on external repositories and build only your specific packages.

Having chosen the first approach, you can be sure that your users will use your great tuned packages of carefully chosen versions that are doomed to work properly. And when you see a problem in a dependency, you freely patch it and… congratulations, now you are a happy maintainer of a zoo of numerous packages containing software written in several languages!

The second way is clear: you build a dozen of your own packages and publish a relatively small repository. And when your third-party dependencies become unavailable, it will be a user’s problem.

Developing Altai, we started from the first solution: a user installs basic RHEL/CentOS and just adds our repository. Nowadays, we are moving to the second…