More subunit needs

Additional pain points

Zope’s test runner runs things that are not tests, but which users want to know about – ‘layers’. At the moment these are reported as individual tests, but this is problematic in a couple of ways. Firstly, the same ‘test’ runs on multiple backend runners, so timing and stats get more complex. Secondly, if a layer fails to setup or teardown, tools like testrepository that have watched the stream will think a test failed, and on the next run try to explicitly run that ‘test’ – but that test doesn’t really exist, so it won’t run [unless an actual test that needs the layer is being run].

Openstack uses python coverage to gather coverage statistics during test runs. Each worker running tests needs to gather and return such statistics. The current subunit protocol has no space to hand this around, without it pretending to be a test [see a pattern here?]. And that has the same negative side effect – test runners like testrepository will try to run that ‘test’. While testrepository doesn’t want to know about coverage itself, it would be nice to be able to pass everything around and have a local hook handle the aggregation of that data.

The way TAP is reflected into subunit today is to mangle each tap ‘test’ into a subunit ‘test’, but for full benefits subunit tests have a higher bar – they are individually addressable and runnable. So a TAP test script is much more equivalent to a subunit test. A similar concept is landing in Python’s unittest soon – ‘subtests’ – which will give very lightweight additional assertions within a larger test concept. Many C test runners that emit individual tests as simple assertions have this property as well – there may be 5 or 10 executables each with dozens of assertions, but only the executables are individually addressable – there is no way to run just one assertion from an executable as a ‘test’. It would be nice to avoid the friction that currently exists when dealing with that situation.

Minimum requirements to support these

Layers can be supported via timestamped stdout output, or fake tests. Neither is compelling, as the former requires special casing in subunit processors to data mine it, and the latter confuses test runners. A way to record something that is structured like a test (has an id – the layer, an outcome – in progress / ok / failed, and attachment data for showing failure details) but isn’t a test would allow the data to flow around without causing confusion in the system.

TAP support could change to just show the entire output as progress on one test and then fail or not at the end. This would result in a cognitive mismatch for folk from the TAP world, as TAP runners report each assertion as a ‘test’, and this would be hidden from subunit. Having a way to record something that is associated with an actual test, and has a name, status, attachment content for the TAP comments field – that would let subunit processors report both the addressable tests (each TAP script) and the individual items, but know that only the overall scripts are runnable.

Python subtests could use a unique test for each subtest, but that has the same issue has layers. Python will ensure a top level test errors if a subtest errors, so strictly speaking we probably don’t need an associated-with concept, but we do need to be able to say that a test-like thing happened that isn’t actually addressable.

Coverage information could be about a single test, or even a subtest, or it could be about the entire work undertaken by the test process. I don’t think we need a single standardised format for Coverage data (though that might be an excellent project for someone to undertake). It is also possible to overthink things🙂. We have the idea of arbitrary attachments for tests. Perhaps arbitrary attachments outside of test scope would be better than specifying stdout/stderr as specific things. On the other hand stdout and stderr are well known things.

Proposal version 2

A packetised length prefixed binary protocol, with each packet containing a small signature, length, routing code, a binary timestamp in UTC, a set of UTF8 tags (active only, no negative tags), a content tag – one of (estimate + number, stdin, stdout, stderr,file, test), test-id, runnable, test-status (one of exists/inprogress/xfail/xsuccess/success/fail/skip), an attachment name, mime type, a last-block marker and a block of bytes.

The std/stdout/stderr content tags are gone, replaced with file. The names stdin,stdout,stderr can be placed in the attachment name field to signal those well known files, and any other files that the test process wants to hand over can be simply embedded. Processors that don’t expect them can just pass them on.

Runnable is a boolean, indicating whether this packet is describing a test that can be executed deliberately (vs an individual TAP assertion, Python sub-test etc). This permits describing things like zope layers which are top level test-like things (they start, stop and can error) though they cannot be run.. and it doesn’t explicitly model the setup/teardown aspect that they have. Should we do that?

Testid is for identifying tests. With the runnable flag to indicate whether a test really is a test, subtests can just be namespaced by the generator – reporters can choose whether to be naive and report every ‘test’, or whether to use simple string prefix-with-non-character-seperator to infer child elements.