Testing/Requirements

This page collects and categorizes information about goals and requirements for the common W3C test-suite framework for browser testing. [SAZ: afaik, the scope should not be limited to browser alone][JS: ATAG 2.0 and UAAG 2.0 will need to test authoring tools and media players in addition to browsers]

Requirements

Requirements for the testing framework

The testing framework must be intended for Candidate Recommendation and post Candidate Recommendation phases

The test suite should be suitable to evaluate if the spec is implementable, but it should also be used to promote interoperability.

This includes:

testing of precise technical requirements such as parsing and validity rules

testing of technical requirements that can only be tested in the context of other requirements.

testing of more general requirements for specification conformance that cannot be evaluated with simply unit tests.

[WAI comment: clarify that this is of general value -- just wording issue with "must be"]

The testing framework must support simple and complex tests

It should be possible to run unit tests (e.g. testing the value of an attribute) as well as complex tests (e..g acid or stress tests).

The testing framework should be intended for user agent conformance testing

It may not be an immediate goal to perform user agent conformance testing, but the creation of a test harness naturally meets many of the requirements for this, and there is likely to be interest in using the test harness for this purpose.

The testing framework should help improve interoperability

While a W3C goal is to test specifications conformance, more important to the community may be interoperability testing. Knowing which user agents produce what results for a given test, regardless of specification requirements related to that test, allows identification of areas of generally consistent and generally inconsistent user agent behaviour.

There should not be an assumption of one-to-one relationship between elements at the various layers. A given test case may require several test files. A given test file may be used by several test cases. A given test execution may be repeated by different users and results stored separately.

The testing framework must equally support test case metadata definitions in test files and external

To improve reuse of test files, test case metadata should be stored separately from test files when possible. Metadata stored within test files could also potentially introduce side effects on the test outcome.

Not withstanding the above, the harness must allow test case metadata to be included in test files as that can facilitate automation in various ways (authoring, review, execution).

The testing framework must be explicit about the test license

Contributors and users of the system must be clear about the license applied to content submitted to the repository.

The testing framework must be able to serve test cases over the Web

The testing framework must use a decentralized version control system for test files and test cases

W3C uses Mercurial.

[WAI comment: seems overly restrictive as a core requirement; some W3C WGs use other systems]

The testing framework must include a test runner

See below for requirements for the test runner.

The testing framework must provide a mechanism for test case review

See below for requirements for the test case review mechanism.

The testing framework must provide a user-friendly tool to ease test suite management

See below for requirements for the test suite management system.

[WAI comment: assuming that accessibility of W3C systems is a given anyway]

The testing framework must provide a reporting tool

See below for requirements for the reporting tool.

The testing framework must provide "coverage" information

In order to know which areas of a spec are well-tested and hence have a sense for (an upper bound on) the completeness of a test suite as well as the areas where it would be most profitable to direct new testing effort, it would be beneficial to produce an annotated version of the spec that associates each testable assertion in the spec with a link to onr or more test cases for that assertion.

See below for requirements for spec annotation.

The testing framework must allow for direct contributions from external individuals or entities

The public at large should be able to submit test files, test cases, as well as test results.

Requirements for the Web test server

The Web test server must be able to run server-side scripts

The exact list of languages that the Web test server must support remains to be precised. PHP and Python should be available.

XMLHttpRequest, CORS, EventSource, HTML5, Widgets WARP, and WCAG will all need a setup like this.

Note: We no longer support PHP on w3c-test.org. There was a builtin review process of the PHP code in the mercurial respository, but it is no longer relevant since the test suite has been converted to being self-hosting in Python.

The Web test server should pull out content from test case repository automatically

Test cases submitted to the test case repository should appear automatically on the Web server, except for test cases that make use server-side scripting, which should first be approved for security reasons.

The Web test server must be available through HTTPS

Requirements for the test runner

The test runner is responsible for running a series of tests and gathering results for all of them.

[WAI comment: Requirements that begin "The test runner must..." seem to be requirements that it be possible to create test runners for that requirement. However, not all test runners may need to meet all of these requirements. Therefore suggest language like "It must be possible for test runners to...". We made this change the first time we encountered it but haven't done it for all of them yet.]

The test runner must support multiple test methods (including self-describing, reftest, and script)

The following test methods are considered.

Self describing

aka human or manual tests.

This is the most basic level. A file (or more) is displayed and a human indicates if the test is passed or failed. Ideally, we should avoid those types of tests as much as possible since it requires a human to operate.
Some folks want to have a comment field as well.

[WAI comment: s/A file (or more) is displayed and a human indicates if the test is passed or failed/A human is provided with one or more test files and a corresponding test procedure (which may be included as part of the test files), and is asked to indicate if the test passes or fails.]

Plain text output

This is equivalent as doing saveAsText on two files and comparing the output.

[WAI comment: a little unclear what is meant]

Reftest

Two pages are displayed and the rendered pages are compared for differences.

For comparison, we might be able to use HTML5 Canvas, or an extension to get screenshots. Worth case scenario is to use a human to compare the rendered pages.

The test runner must be able to load tests automatically based on manifest files

The test runner must be able to order test cases smartly

Purely automated tests should be grouped together to avoid a situation where the user is solicited on a random basis. This may be done when creating manifest files.

The test runner must allow for tests to be run in random order and repetitively

The goal is to detect failure under certain conditions

The test runner must allow for complete and partial execution of tests

Selection of subset can be based on the metadata describing the test; for instance, to select all tests that apply to a certain feature, element, or other aspect of the test.

It must be possible to create test runners that work on various platforms

Test runners should be available that work on main operating systems (e.g. Windows, MacOS, Ubuntu), most user agents, and on various types of terminals (e.g. desktop, mobile).

Some environments might require specific developments. For instance, on mobile devices, test suites might need to be splitted or packaged differently after a certain size to cope with the limitations of the platform.

This requirement might be met by providing different test runners for different environments.

The test runner must provide some way to output collected results

This might either take the form of a raw text file format, XML, JSON, or internal database storage.

The test runner must allow for automatic and manual gathering of context information

This context information includes the browser versions, the OS platform, as well as relevant configuration settings and assistive technology if applicable.

The test runner must include context information in collected results

Result records must be complete with information about the test case, the tester, the revision if applicable, the user agent, etc.

The test runner must support positive and negative testing

It must be possible to define positive tests of specification requirements.

It must be possible to define negative tests that actively test failure to meet specification requirements or test error handling behaviour.

The test runner must support testing of time based information

The requirement is needed for SVG animation, HTML video for instance.

The test runner must allow a test to report its result automatically

Some hook must be available so that automated tests can report their results without human intervention.

The test runner must allow humans to report on manual test outcome

There should be some pass/fail/unknown submission procedure available for manual tests.

The test runner must allow reftests to be run by humans

Even if reftests can be automated, the test runner should provide a way for humans to report on a reftest, possibly switching between test view and reference view several times per second and asking if the user sees flickering.

Automatic running of reftests requires browser-specific code and is explicitly out of scope.

The test runner should allow for humans to comment on a test outcome

Allows a text comment field for human evaluator notes (e.g. test conditions, failure notes) on the individual test result that can be included in the reporting. E.g. they might write: "the authoring tool implements this SC with a button that automatically sends the content being edited to the XXX Checker accessibility checking service".

The test runner must allow tests to be created on smaller tests

This would allow one action to be repeated several times within the same test, for instance to detect failure under certain conditions.

The test runner must be usable by external entities and individuals

Note though that some test suites may need specific conditions to run.

Requirements for the test case review mechanism

The test case review mechanism must enable review without putting a Working Group on the critical path for every single test

[WAI comment: we may also want to pursue public review and rating systems (though there are several concerns including critical mass to make the system useful, avoiding spam, avoiding disruptive or bogus entries]

The test case review mechanism must provide an easy way to submit a test

In particular, this should not be restricted to named reviewers or people with W3C accounts

The test case review mechanism should integrate with Mercurial

The distributed version control system should be used as much as possible.

Requirements for the test suite management system

The test suite management system must scale to a large number of tests

There may be more than 100,000 test cases per specification.

The test suite management system must track the state of test cases

Test cases may be:

under review

approved

rejected

The test suite management system should allow association of a test case with issues, action items or mailing-list threads

Integration with W3C tracker tool?

The test suite management system should allow stable dated release of test suites

Test suite revisions will be used in particular to link back collected results to the appropriate versions of a test suite and to create snapshots when needed (e.g. for an implementation report).

Requirements for the reporting tool

The reporting tool must be able to produce a machine-readable report

The actual format needs to be precised. It could be XML or non-XML. The Evaluation and Report Language (EARL) provides a machine-readable format for expressing test results in RDF with an XML serialization, for instance.

The output should be reusable by other applications. It should also be usable to answer questions such as:

Is feature X supported on Browser 4.3?

What does Browser 4.3 support?

The reporting tool should be able to produce an agglomerated report

Multiple test results may be available for a given test case. The reporting tool should be able to combine them and report most likely test outcome.

The reporting tool should support authoritative result

When multiple test results for a given test case exist, there must be a mechanism to compare results and determine an authoritative results. This must be limited to privileged users.

Requirements for the spec annotation tool

[WAI comment: it is important to further explain what the "spec annotation tool" is. Also, one should not assume that spec annotation is the only method for identifying testable statements from the spec.]

The spec annotation tool must map each test case onto a part of the spec

In turn, this creates a requirement on the metadata test cases must define. The definition of "part" is up to the spec under test. It may mean:

Requirements for test cases and test files

Test cases must not depend on the test runner

A test may be able to generate its result automatically (such as Script test) or not (such as Self describing test). If it is automatic, it is the responsibility of the test to report its result to the test runner above it using some hook. Otherwise, it is the responsibility of the test runner to gather the result from an alternate source (such as a human).

Test cases should be designed for multiple purpose

Test files and test cases should be designed as neutrally as possible so they can be repurposed. Multiple Working Groups may have reasons to re-use test files and should not be forced to create redundant versions. Even within a specification, a given test file may be used to test multiple things.

Test cases must have a unique ID

Test cases (and test files) must have a unique ID. A URI may be sufficient for test files. The ID should not be expected to contain metadata about the test in its lexical form, although as a convenience many IDs may have some structure.

The targeted granularity may vary depending on the specification. For some specification, it may be enough to link back to the section that contains the conformance statement. For other specifications, a more precise link to the actual conformance statement may be needed.

[WAI comment: this relates to the spec annotation and this relationship should be explicit and clearly explained]

Note a test case may apply to more than one specification.

[WAI comment: it is mainly test files rather than test cases that may apply to more than one specification]

Test cases may apply to the same conformance statement as other test cases

There may be more than one test cases per conformance statement.

Test files may depend on other test files

Test files consisting of a single file (singleton test files) are preferred for simplicity and portability, but it must be possible for test files to have dependencies on external resources such as images, scripts, etc.

Test files may depend on shared resources

It must be possible for resources, such as images, scripts, etc., to be shared by multiple test files. The test file repository structure must accommodate actual "test files" as well as resources that are not themselves considered test files.

Test files may generate test files

Some of the test files may be generators for a collection of test files and test cases created e.g. by varying a single parameter.

[WAI comment: this may interfere with the requirement for unique and constant identifiers for test cases]

Watir and FireWatir. Watir allows one to automate tests using Watir drives browsers the same way people do. It clicks links, fills in forms, presses buttons. Watir also checks results, such as whether expected text appears on the page. It does not seem to provide screenshot facilities unfortunately. No support for Safari on Windows?

Browserscope is a community-driven project for profiling web browsers. The goals are to foster innovation by tracking browser functionality and to be a resource for web developers.