On Jun 14, 2005, at 9:56 PM, Martin Wille wrote:
> The running time depends much on what changes have been done (if you
> run
> the tests incrementally).
>
> A clean run (or a run after something changed in, say, Boost.Config)
> takes 1.5 days here (or even longer if something goes wrong). If there
> are no changes then a run takes only three hours.
>
> Since the running time isn't predictable, I don't think we can setup
> the
> test runs such that the results are available at a specified time of
> day.

Right, the OSL testers are only predictable because they aren't
incremental and test only 1-2 compilers each.

> The toolsets which take most of the time are intel-8.1 (because the
> compiler is slow) and gcc-2.95 (because >50% of the tests get
> recompiled
> every run). My plan currently is to drop support for gcc-2.95 (and its
> STLport variant) after release 1.33. I'm also considering dropping gcc
> 3.2.3. It isn't considered release relavent for this release. It'd
> hardly be for the next.

In truth, I'm actually hoping that we can add compilers/platforms after
the release. Once we have a clean slate (= no unresolved issues), it's
easier for us to keep it at a clean slate now that we get daily
feedback on our changes. If someone expresses an interest in getting
toolset X to work, we add it as a release compiler and clean up the
mess.

GCC 3.2.3 should be marked supported... there are only 2 failures
different from GCC 3.3.6, and overall the compiler does very well and
is used by many Linux distributions.

As for GCC 2.95.3... I never know what to do about that compiler. I've
heard that it's still used by lots of people, but I haven't seen any
evidence of that myself.

> Since you made gcc-3.3.6 and gcc 3.4.4 release relevant instead of gcc
> 3.3.5 and 3.4.3, I can (and will) drop the latter two toolsets.

Okay.

> If you have spare resources then you could run intel tests if that
> compiler is supported by intel for the Linux distribution you use.
> There's no hassle involved, license-wise, in installing the Intel
> compiler for testing Boost. Intel doesn't support the distribution I
> use
> and making Intel's install script work involves manual work for every
> update and it is quite a bit of a hassle (e.g. it involves installing a
> fake RPM database). If you could run the intel tests instead of me then
> this would make my life easier and improve my testing throughput.

Our Linux boxes run Gentoo, which is unfortunately not a supported
distribution. But, I'll check with our sysadmin; he might have some
tricks up his sleeve to make things run more smoothly, and we still
have one or two Linux systems that could also be doing nightly testing.

>
>> My point is simple: More testing is good, but predictable, up-to-date
>> results are better.
>
> I fully agree. However, I'd like to add: we need redundancy, too.

Yes, I agree. For instance, there's a Spirit test that passes on
gcc-3_3-darwin (single processor) that fails on gcc-3_3-darwin (dual
processor).

> I'd like to remind you of the suggestion I made some time ago: let's
> have two result sets for each runner; one "committed" result set which
> won't get changed anymore and one "building" result set. The "building"
> set would get updated after every single run for a toolset. Once all
> toolsets are run, the "building" set becomes "committed" and a new
> empty
> "building" set is created. This improves turnaround times by allowing
> for intermediate results to get displayed much earlier than the
> complete
> set. If something breaks badly then the problem becomes apparent quite
> quickly, can get fixed quickly and the "building" set can get resetted
> before tests have been run for all toolsets.

Interesting. I don't understand the testing system well enough to know
how this would be achieved.

> 1. Add the "unsupported" information to the tests themselves, e.g. by
> making them print "unsupported" (we could even add information about
> what is unsupported: "unsupported(compiler)", "unsupported(bzlib)").
> This would spare us some markup and the information provided would be
> more detailed than what the manual markup currently gives us (e.g.
> Spirit is marked unusable for gcc-2.95. Some parts, though, would work
> on that compiler.)

GCC does this by placing comments in the test files, e.g., "{ xfail
i686-*-* }". Granted, their tests tend to be very different from ours.

> 2. Add another step to build procedure. That step would make the
> information from Boost.Config available to bjam. This could be done by
> writing a C++ program which writes a jamfile which gets included later.
> This would enable a library author to turn the tests for, say, wide
> character sets off when they aren't well supported by the environment.

There's also the explicit-failures-markup, which contains the
"unusable" information used by the reports. If bjam could grok that,
we'd get the same results. In some ways that's easier, because one
could write some XSLT to transform explicit-failures-markup into
something bjam could read and use.