Tuesday, November 8, 2016

Almost
all of Collabora's customers use the Linux kernel on their products.
Often they will use the exact code as delivered by the SBC vendors and
we'll work with them in other pars of their software stack. But it's
becoming increasingly common for our customers to adapt the kernel
sources to the specific needs of their particular products.

A
very big problem most of them have is that the kernel version they
based on isn't getting security updates any more because it's already
several years old. And the reason why companies are shipping kernels so
old is that they have been so heavily modified compared to the upstream
versions, that rebasing their trees on top of newer mainline releases is
so expensive that is very hard to budget and plan for it.

To
avoid that, we always recommend our customers to stay close to their
upstreams, which implies rebasing often on top of new releases
(typically LTS releases, with long term support). For the budgeting of that
work to become possible, the size of the delta between mainline and downstream
sources needs to be manageable, which is why we recommend
contributing back any changes that aren't strictly specific to their
products.

But
even for those few companies that already have processes in place for
upstreaming their changes and are rebasing regularly on top of new LTS
releases, keeping up with mainline can be a substantial disruption of
their production schedules. This is in part because new bugs will
be in the new mainline release, and new bugs will be in the downstream
changes as they get applied to the new version.

Those
companies that are already keeping close to their upstreams typically
have advanced QA infrastructure that will detect those bugs long before
production, but a long stabilization phase after every rebase can
significantly slow product development.

To
improve this situation and encourage more companies to keep their
efforts close to upstream we at Collabora have been working for a
few years already in continuous integration of FOSS components across a diverse
array of hardware. The initial work was sponsored by Bosch for one of
their automotive projects, and since the start of 2016 Google has been
sponsoring work on continuous integration of the mainline kernel.

One
of the major efforts to continuously integrate the mainline Linux kernel
codebase is kernelci.org, which builds several configurations of
different trees and submits boot jobs to several labs around the world,
collating the results. This is being of great help already in detecting
at a very early stage any changes that either break the builds, or
prevent a specific piece of hardware from completing the boot stage.

Though
kernelci.org can easily detect when an update to a source code
repository has introduced a bug, such updates can have several dozens of
new commits, and without knowing which specific commit introduced the
bug, we cannot identify culprits to notify of the problem. This means
that either someone needs to monitor the dashboard for problems, or
email notifications are sent to the owners of the repositories who then
have to manually look for suspicious commits before getting in contact
with their author.

To
address this limitation, Google has asked us to look into improving the
existing code for automatic bisection so it can be used right away when
a regression is detected, so the possible culprits are notified right
away without any manual intervention.

Another
area in which kernelci.org is currently lacking is in the coverage of
the testing. Build and boot regressions are very annoying for developers
because they impact negatively everybody who work in the affected
configurations and hardware, but the consequences of regressions in
peripheral support or other subsystems that aren't involved critically
during boot can still make rebases much costlier.

At Collabora we have had a strong interest in having the DRM subsystem
under continuous integration and some time ago started a R&D project
for making the test suite in IGT generically useful for all the DRM
drivers. IGT started out being i915-specific, but as most of the tests
exercise the generic DRM ABI, they could as well test other drivers with
a moderate amount of effort. Early in 2016 Google started sponsoring this work and as of today submitters of new drivers are using it to validate their code.

Another related effort has been the addition to DRM of a generic ABI for retrieving CRCs of frames from different components in the graphics pipeline, so two frames can be compared when we know that they should match. And another one is adding support to IGT for the Chamelium board, which can simulate several display connections and hotplug events.

A
side-effect of having continuous integration of changes in mainline is
that when downstreams are sending back changes to reduce their delta,
the risk of introducing regressions is much smaller and their
contributions can be accepted faster and with less effort.

We
believe that improved QA of FOSS components will expand the base of
companies that can benefit from involvement in development upstream and
are very excited by the changes that this will bring to the industry. If
you are an engineer who cares about QA and FOSS, and would like to work
with us on projects such as kernelci.org, LAVA, IGT and Chamelium, get in touch!