On Tue, 22 Mar 2005, Andrew D. Fant wrote:
> My personal complaint is that there aren't enough good standard
> test/validation suites out there for cluster building. Some libraries
> like Atlas include them, but they are also tied to that specific
> package. It would be really great if as a community we could do
> something like the Linux test project oriented towards cluster-building
> and scientific computing. Something that I can run when my boss wants
> "proof" that upgrading a library didn't completely rejigger the
> numerical stability of the results. I know that the stock answer here
> is that we ought to generate our own regression tests based on our on
> particular application set, but I think it would be a boon for a more
> generic framework and solution to evolve. If nothing else, it would
> offer a basis for heterogeneous systems in a grid environment to trust
> each other's results without necessarily requiring full application
> cross-validation. It might be a pipe dream, but I like it 8-)
Hmmm. OK, how's this. Just supposing that I finish building xmlbenchd
before an infinite amount of time elapses (I've once again gotten mired
in teaching and haven't had time to work on it for a week+). Suppose
xmlbenchd can run any given program inside a fairly standard timing
wrapper (probably a perl script for maximum portability and ease of
use). Suppose that the perl script, which will certainly contain the
command line for the application which therefore will (for fixed random
number seeds where appropriate) produce some sort of fixed output.
Then it would be trivial to add a segment to at LEAST diff the output
with the expected output, and not a horrible amount of work to actually
compute a chisq difference between the two. I can easily introduce xml
tags for returning a validation score on the actual result (or even a
set of such scores) because extensibility IS useful during the time a
new thing is being invented (sorry, Don:-) if not beyond. This would
permit the best of both worlds:
a) I expect to assemble a set of macro-level applications to function
somewhat like the spec suite does today but without the "corporation"
baggage, for distribution WITH the package. At that point I will
actually solicit this list for good candidates for primary inclusion.
This set can actually be quite large, permitting users to preselect at
configuration time the ones to run for their particular site. For the
ones that are selected, I will go ahead and do the validation test for
when I wrap them up in the timing script.
b) Users who want to wrap their OWN application set up for automated
benchmarking inside the provided template script will then be able to
follow fairly simple instructions and (presuming that they know enough
perl to be able to parse their application's output file(s)) validate as
well as time to their heart's content.
This may not be sufficient for all users -- I'm probably not going to
write a core loop that would permit a sweep of an input parameter in the
command line, for example, and to test e.g. special function calls in
the GSL that change algorithms at certain breakpoints that kind of thing
is really necessary. However, folks with more advanced needs will
presumably be more advanced programmers and the perl to add such a sweep
and generate a more complex validation isn't terrribly challenging.
Would that do?
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu