Thu, 14 Aug 2014

Welcome to the third in my set of posts covering discussion topics at the nova juno mid-cycle meetup. The series might never end to be honest.

This post will cover the progress of the ironic nova driver. This driver is interesting as an example of a large contribution to the nova code base for a couple of reasons -- its an official OpenStack project instead of a vendor driver, which means we should already have well aligned goals. The driver has been written entirely using our development process, so its already been reviewed to OpenStack standards, instead of being a large code dump from a separate development process. Finally, its forced us to think through what merging a non-trivial code contribution should look like, and I think that formula will be useful for later similar efforts, the Docker driver for example.

One of the sticking points with getting the ironic driver landed is exactly how upgrade for baremetal driver users will work. The nova team has been unwilling to just remove the baremetal driver, as we know that it has been deployed by at least a few OpenStack users -- the largest deployment I am aware of is over 1,000 machines. Now, this unfortunate because the baremetal driver was always intended to be experimental. I think what we've learnt from this is that any driver which merges into the nova code base has to be supported for a reasonable period of time -- nova isn't the right place for experiments. Now that we have the stackforge driver model I don't think that's too terrible, because people can iterate quickly in stackforge, and when they have something stable and supportable they can merge it into nova. This gives us the best of both worlds, while providing a strong signal to deployers about what the nova team is willing to support for long periods of time.

The solution we came up with for upgrades from baremetal to ironic is that the deployer will upgrade to juno, and then run a script which converts their baremetal nodes to ironic nodes. This script is "off line" in the sense that we do not expect new baremetal nodes to be launchable during this process, nor after it is completed. All further launches would be via the ironic driver.

These nodes that are upgraded to ironic will exist in a degraded state. We are not requiring ironic to support their full set of functionality on these nodes, just the bare minimum that baremetal did, which is listing instances, rebooting them, and deleting them. Launch is excluded for the reasoning described above.

We have also asked the ironic team to help us provide a baremetal API extension which knows how to talk to ironic, but this was identified as a need fairly late in the cycle and I expect it to be a request for a feature freeze exception when the time comes.

The current plan is to remove the baremetal driver in the Kilo release.

Previously in this post I alluded to the review mechanism we're using for the ironic driver. What does that actually look like? Well, what we've done is ask the ironic team to propose the driver as a series of smallish (500 line) changes. These changes are broken up by functionality, for example the code to boot an instance might be in one of these changes. However, because of the complexity of splitting existing code up, we're not requiring a tempest pass on each step in the chain of reviews. We're instead only requiring this for the final member in the chain. This means that we're not compromising our CI requirements, while maximizing the readability of what would otherwise be a very large review. To stop the reviews from merging before we're comfortable with them, there's a marker review at the beginning of the chain which is currently -2'ed. When all the code is ready to go, I remove the -2 and approve that first review and they should all merge together.