Proceedings Paper

In this paper we discuss three questions relating to the use of reference data sets and reference results in black box tests for validating assessment software: (1) how to generate data and results, (2) how to represent solutions in a stable way and (3) how to compare test results and reference results. We describe a general method for generating data and results which goes some way to addressing all three problems and illustrate the concepts introduced for least squares form assessment and theodolite triangulation.