Syndication

My last update about ongoing development of botch, the bootstrap/build ordering
tool chain, was four months ago and about several incremental updates.
This post will be of similar nature. The most interesting news is probably the
additional data that bootstrap.debian.net now provides. This is listed in
the next section. All subsequent sections then list the changes under the hood
that made the additions to bootstrap.debian.net possible.

bootstrap.debian.net

The bootstrap.debian.net service used to have botch as a git submodule
but now runs botch from its Debian package. This at least proves that the botch
Debian package is mature enough to do useful stuff with it. In addition to the
bootstrapping results by architecture, bootstrap.debian.net now also hosts the
following additional services:

History of graph size shows how the dependency graph developed over time for a normal self-contained repository plus for both minimizing strategies, updated every five days

Crossbuild order modifies the metadata of a repository so that the crossbuild dependency situation can be analyzed and then outputs an overview page as it is done for every architecture on the main page

Further improvements concern how dependency cycles are now presented in the
html overviews. While before, vertices in a cycle where separated by commas as
if they were simple package lists, vertices are now connected by unicode
arrows. Dashed arrows indicate build dependencies while solid arrows indicate
builds-from relationships. For what it's worth, installation set vertices now
contain their installation set in their title attribute.

Debian package

Botch has long depended on features of an unreleased version of dose3 which
in turn depended on an unrelease version of libcudf. Both projects have
recently made new releases so that I was now able to drop the dose3 git
submodule and rely on the host system's dose3 version instead. This also made
it possible to create a Debian package of botch which currently sits at Debian
mentors. Writing the package also finally made me create a usable
install target in the Makefile as well as adding stubs for the manpages of
the 44 applications that botch currently ships. The actual content of these
manpages still has to be written. The only documentation botch currently ships
in the botch-doc package is an offline version of the wiki on gitorious.
The new page ExamplesGraphs even includes pictures.

Cross

By default, botch analyzes the native bootstrapping phase. That is, assume that
the initial set of Essential:yes and build-essential packages magically
exists and find out how to bootstrap the rest from there through native
compilation. But part of the bootstrapping problem is also to create the set of
Essential:yes and build-essential packages from nothing via cross
compilation. Botch is unable to analyze the cross phase because too many
packages cannot satisfy their crossbuild dependencies due to multiarch
conflicts. This problem is only about the dependency metadata and not about
whether a given source package actually crosscompiles fine in practice.

Helmut Grohne has done great work with rebootstrap which is regularly
run by jenkins.debian.net. He convinced me that we need an overview of
what packages are blocking the analysis of the cross case and that it was
useful to have a crossbuild order even if that was a fake order just to have a
rough overview of the current situation in Debian Sid.

I wrote a couple of scripts which would run dose-builddebcheck on a
repository, analyze which packages fail to satisfy their crossbuild
dependencies and why, fix those cases by adjusting package metadata accordingly
and repeat until all relevant source packages satisfy their crossbuild
dependencies. The result of this can then be used to identify the packages that
need to be modified as well as to generate a crossbuild order.

The fixes to the metadata are done in an automatic fashion and do not
necessarily reflect the real fix that would solve the problem. Nevertheless, I
ended up agreeing that it is better to have a slightly wrong overview than no
overview at all.

Minimizing the dependency graph size

Installation sets in the dependency graph are calculated independent from each
other. If two binary packages provide A, then dependencies on A in
different installation sets might choose different binary packages as providers
of A. The same holds for disjunctive dependencies. If a package depends on A
| C and another package depends on C | A then there is no coordination to
choose C so to minimize the overall amount of vertices in the graph. I
implemented two methods to minimize the impact of cases where the dependency
solver has multiple options to satisfy a dependency through Provides and
dependency disjunctions.

The first method is inspired by Helmut Grohne. An algorithm goes through all
disjunctive binary dependencies and removes all virtual packages, leaving only
real packages. Of the remaining real packages, the first one is selected. For
build dependencies, the algorithm drops all but the first package in every
disjunction. This is also what sbuild does. Unfortunately this solution
produces an unsatisfiable dependency situation in most cases. This is because
oftentimes it is necessary to select the virtual disjunctive dependency because
of a conflict relationship introduced by another package.

The second method involves aspcud, a cudf solver which can optimize a
solution by a criteria. This solution is based on an idea by Pietro Abate who
implemented the basis for this idea back in 2012. In contrast to a usual cudf
problem, binary packages now also depend on the source packages they build
from. If we now ask aspcud to find an installation set for one of the base
source packages (I chose src:build-essential) then it will return an
installation set that includes source packages. As an optimization criteria the
number of source packages in the installation set is minimized. This solution
would be flawless if there were no conflicts between binary packages. Due to
conflicts not all binary packages that must be coinstallable for this strategy
to work can be coinstalled. The quick and dirty solution is to remove all
conflicts before passing the cudf universe to aspcud. But this also means
that the solution does sometimes not work in practice.

Test cases

Botch now finally has a test target in its Makefile. The test target
tests two code paths of the native.sh script and the cross.sh script.
Running these two scripts covers testing most parts of botch. Given that I did
lots of refactoring in the past weeks, the test cases greatly helped to assure
that I didnt break anything in the process.

I also added autopkgtests to the Debian packaging which test the same
things as the test target but naturally run the installed version of botch
instead. The autopkgtests were a great help in weeding out some lasts bugs
which made botch depend on being executed from its source directory.

Python 3

Reading the suggestions in the Debian python policy I evaluated the
possibility to use Python 3 for the Python scripts in botch. While I was at it
I added transparent decompression for gzip, bz2 and xz based on the file magic,
replaced python-apt with python-debian because of bug#748922 and added
argparse argument parsing to all scripts.

Unfortunately I had to find out that Python 3 support does not yet seem to be
possible for botch for the following reasons:

no soap module for Python 3 in Debian (needed for bts access)

hash randomization is turned on by default in Python 3 and therefore the graph output of networkx is not deterministic anymore (bug#749710)

Thus I settled for changing the code such that it would be compatible with
Python 2 as well as with Python 3. Because of the changed string handling and
sys.stdout properties in Python 3 this proved to be tricky. On the other
hand this showed me bugs in my code where I was wrongly relying on
deterministic dictionary key traversal.