Doing quality assurance for something as large as an operating system presents a few problems, similar to those for any software project, just on a larger scale:

writing a set of comprehensive automated tests (and having a means to analyze and baseline results)

ensuring those tests are maintained and executed frequently

knowing what tests to execute when making a change to the OS

In Solaris, we have several dedicated test teams to execute our automated tests periodically (both on nightly builds, and on biweekly milestone builds) as well as on-demand, and bugs are filed when problems occur. Each team tends to focus on a different area of the OS. We test changes from one consolidation at a time, before doing wider-area testing with changes from all consolidations built into a single OS image.

However, only finding problems after the relevant change has integrated to the source tree is often too late. It’s one thing causing a problem with an application, but an entirely different thing if your change causes the system to panic – no more testing can be done on those bits and the breakage you introduced impacts everybody.

To try to reduce the chance of that happening, we try to “build in” quality into our development processes, to make the break-fix cycle as short as possible. To get a sense of where potential gaps were, I spent time looking at all of the testing that we do for Solaris, and documented it in an wiki page, ordering it chronologically.

I won’t repeat the entire thing here, but thought it might be interesting to at least show you the headings and subheadings. Some of this refers to specific teams that perform testing, other parts simply indicate the type of testing performed. This list now contains some of the test improvements I added, and will talk about those later.

Before you push

You

Your Desktop/VM/cloud instances/LRT

The build

Your test teams

DIY-PIT and QSE-PIT

Selftest Performance testing

Project PIT runs

AK Testing

After you push

Incremental builds

Incremental boots

Incremental unit-tests

Periodic Live media boots

Nightly builds

Running the nightly bits on the build machines

Nightly WOS builds

Nightly gate tests

Nightly ON-PIT tests

Bi-weekly ON-PIT tests

After the WOS is built

Solaris RE tests

Solaris SST

SWFVT

ZFSSA Systems Test

Conformance testing

Performance testing

Jurassic, Compute Ranch build machines, shared group build machines

Platinum beta customers

Earlier releases

SRU testing

(A note on the terminology here: “WOS” stands for “Wad Of Stuff” – it’s the biweekly Solaris image that’s constructed by bundling together all of the latest software from every consolidation into a single image which can be freshly installed, or upgraded to.

“PIT” stands for “Pre-Integration Test”, typically meaning testing performed on changes pushed to each consolidation’s source tree, but not yet built into a WOS image)

Running the bits you build

I’ve talked before about the ON culture of running the bits you develop, so won’t repeat myself here, except to say that the gate machine, the gate build machines, and all developers are expected to run at least biweekly, if not nightly bits. As engineers, we tend to be a curious lot, and enjoy experimenting with shiny new software – it’s amazing (and a little worrying) to discover bugs that the test suites don’t. As we find such gaps in test suites, we file bugs against them so that the test suites continually improve.

Building the bits you run

Building Solaris itself turns out to be a good stress test for the operating system, invoking thousands of processes and putting a reasonable load on the system, more so if it’s a shared build machine.

The build itself also does a significant amount of work verifying that the software it’s producing is correct: apart from the obvious tools tha run during the build, like lint and Parfait (an internal static analysis tool) there are a host of other checks that perform verification on the binaries that are produced.

Indeed, to maximise the chances of finding errors during the build, we compile the source tree with two separate compilers (currently Oracle Developer Studio and gcc) discarding the binaries produced by the “shadow compiler”. As the different compilers produce different warnings, sometimes one will report errors that the other misses, which can be an aid to developers.

The problem with catching problems early

As much as possible, we emphasise pre-integration testing to find problems early. The flip side of that, is that not every engineer has access to some of our larger hardware configurations, and test labs containing them are a finite resource.

Another problem is that even with access to those large systems, how do you know which tests ought to be executed? Since lab-time is limited and some tests can take a long time to complete, we simply can’t run all the tests before every integration because then we’d never be able to effectively make changes.

A common way for tests to be developed for Solaris, was by having a separate teams of test engineers maintain and update tests rather than developers owning their own tests (this wasn’t the rule of course – some developers modified those test suites directly)

In some cases where engineers did write their own tests, the test code was often stored in their home directories – they’d know to execute the tests the next time they were making changes to their code, but nobody else would know of their existence, and breakage could occur.

The build was also lacking any way to describe which tests ought to be executed when a given piece of software changed, and it became a question of “received wisdom” and experience to determine what testing needed to be performed for any given change.

Continuous integration in ON

For some time (5+ years before my time as far as I can tell), the ON gate has had a simple incremental build facility. As each commit happened to the source tree, some custom code, driven by cron and procmail would select one of four previously built workspaces, pull the latest changes and kick off a build.

This typically found build-time problems quickly, so we’d be able to either backout changes that didn’t build, or get in touch with the responsible developer to arrange a fix before too many engineers were impacted, but the binaries that were produced by those incremental builds were simply discarded, which seemed like a lost opportunity to me.

Even worse, from time to time, we’d get integrations that built fine, but actually failed to boot on specific systems due to inadequate testing!

So, to see about modernizing our build infrastructure and plug this gap, I started looking into using Jenkins to not only periodically build the source tree, but also to update a series of kernel zones with those changes and make sure that we were capable of booting the resulting OS.

That was a pretty simple change, and I was pleased with how it turned out. Once that was in place for a few months, I started to wonder what else we could do with those newly booted zones?

Developer notifications and unit testing

I’ve mentioned already that in a large engineering organisation, it’s difficult to know what to test, and being dependent on a separate test group to implement your test suite can be frustrating. Of course, there can be advantages in that separation of duty – having a different pair of eyes looking at changes and writing tests can find problems that a developer would otherwise be blind to.

Given our experience with the IPS consolidation, and its use of unit tests, one of the Solaris development teams working in ON decided to take a similar route, wanting to add their tests to the ON source tree directly.

Rather than deal with the chaos of multiple teams following suit, I felt it was time to formalize how tests were added to the source tree, and to write a simple unit-test framework to allow those tests to be discovered and executed, as well as a way to advertise specific other testing and development advice that could be relevant when we detect modifications to a given source file.

Obviously there were some limits with what we could do here – some tests require specific hardware or software configurations, and so wouldn’t be appropriate for set of build-time tests, other tests are too long-running to really be considered “unit tests”.

Other tests may require elevated privileges, or may attempt to modify the test machine during execution, so it can be tricky to determine when to write a separate test suite, vs. when to enroll in the gate unit-test framework.

As part of this work, I modified our “hg pbchk” command (part of our Mercurial extension that performs basic pre-putback verification on changesets about to be integrated to the source tree, essentially ensuring the integration paperwork is correct)

The pbchk command now loads all of the test descriptor files found in the workspace, reports if tests are associated with the sources being modified, and will print specific developer notifications that ought to be emitted when a given source file changes.

I think of it as a “Robotic CRT advocate” – to point out testing that ought to run prior to integration (the CRT, or “Change Review Team” are a group of senior engineers who must pre-approve each and every putback to the ON source tree and will see the results of ‘hg pbchk’ during their review, and will verify that testing was completed)

Over time, that test framework is getting more and more use, and we now have tests that are easy to run, implemented as a simple build(1) task. Here are the ones we have today:

Of course, this is still a small fraction of the overall number of tests that run on the OS, but my hope is that we will continue to extend these unit tests over time. From my past experience as a test developer on the ZFS Test team, the easier you make tests to execute, the more likely a developer is actually going to run them!

In conjunction with the more comprehensive Jenkins pipelines we have recently finished work on, this framework has been well received, and has found problems before customers do – which continues to make me very happy.