Comments

We propose to make the current output of go test -bench the defined format for recording all Go benchmark data. Having a defined format allows benchmark measurement programs and benchmark analysis programs to interoperate while evolving independently.

This comment has been minimized.

Why have the key start with a lowercase letter rather than an uppercase one? Wouldn't having it start with an uppercase letter make it easier to write go code that unmarshals the configuration (for example, not needing struct tags like in json)?

This comment has been minimized.

@cespare, in the general case dropping that restriction would make the job of the parser much harder, if not impossible. Since we're proposing that benchmarks can emit custom metrics, that would require the parser to know how to conform arbitrary scalings of arbitrary units. It would also require that we specify much more precisely the syntax of units. Currently we've specified the unit syntax enough to make "best effort" rescaling possible in benchmark processors, but even if a processor doesn't understand a unit, it can at least expect it to be spelled consistently so it can match up results.

This comment has been minimized.

This comment has been minimized.

This proposal adds one new kind of line: configuration. What do you think about also standardizing a column header line format? Right now the format doesn't provide any mechanism for carrying the context around with it, so a line like this

Now, given that lines other than configuration lines and benchmark result lines are ignored, it's possible for someone to make up their own format for column headers. But might it be better to settle on a single convention from the start?

(A slightly different convention would be to decide on a well-known configuration key (columns?) which specifies the column names.)

This comment has been minimized.

I wonder whether the specification should call out that the file is UTF-8 encoded and that lines are separated by line feeds.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

Presumably if the answer to the above is "1 or more", then the spaces may also be omitted in this case.

The first field is the benchmark name, which must begin with Benchmark and is typically followed by a capital letter, as in BenchmarkReverseString. Tools displaying benchmark data conventionally omit the Benchmark prefix.

It'd be better to require a following capital letter. If the name is just Benchmark and the tool omits the prefix, the name will be blank, which is likely to confuse the user or look like a bug. The Go tool already requires this of benchmark function names.

The same benchmark name can appear on multiple result lines, indicating that the benchmark was run multiple times.

This is a mismatch with running benchmarks over multiple packages right now. Using tip:

The main known issue with the current go test -bench is that we'd like to emit finer-grained detail about runs, for linearity testing and more robust statistics. This proposal allows that by simply printing more result lines.

Link to issue 10669.

We anticipate that the new x/perf subrepo will include a library for loading benchmark data from files

This comment has been minimized.

I wonder whether the specification should call out that the file is UTF-8 encoded and that lines are separated by line feeds.

Sure.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

It's a fascinating question, but one that maybe we should leave open, at least at first. I can't think of any units that wouldn't have exactly one slash, but maybe (probably) I am just not thinking hard enough.

I suggest to require all lines beginning with a # character to be ignored.

This comment has been minimized.

Presumably if the answer to the above is "1 or more", then the spaces may also be omitted in this case.

Yes.

It'd be better to require a following capital letter. If the name is just Benchmark and the tool omits the prefix, the name will be blank, which is likely to confuse the user or look like a bug. The Go tool already requires this of benchmark function names.

Interesting corner case, thanks. Actually the go command allows plain Benchmark. The tool will have to rename it to something. The go command does require that if something follows "Benchmark", that something must not be a lower case letter: "Benchmarked" is not a benchmark.

If this proposal were adopted, however, we could add to the output a config line to disambiguate,

I expect deprecated in favor of a new parser in x/perf (see #14304). The code in package parse is very tied to benchcmp's view of the world (unsurprising since it was basically an mv from cmd/benchcmp). Worse, benchcmp's view of the world - which I accept full blame for - is fundamentally flawed, since it doesn't understand the idea of running a benchmark multiple times and therefore cannot do any statistical validity checks. A general parsing library should really provide something closer to the raw input, leaving analysis to the code invoking the parser. For benchstat, I started by using package parse and trying to recook the results into the form I wanted, but I found that it was easier to parse the raw data from scratch.

For that matter, benchstat will move to x/perf, but probably, due to the statistical failings, benchcmp will be left behind, deprecated, and eventually removed. Not for a while though, and somewhat off-topic for the current proposal.

This comment has been minimized.

I suggest to require all lines beginning with a # character to be ignored.

Good catch. I made the rule a bit stricter than you proposed by requiring that the first character satisfy unicode.IsLower(). If there's a compelling case to weaken this, we can do so in the future.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

It's a fascinating question, but one that maybe we should leave open, at least at first. I can't think of any units that wouldn't have exactly one slash, but maybe (probably) I am just not thinking hard enough.

There are useful units with zero slashes, such as binary size in a compiler benchmark and median GC utilization (though you could argue these are "per iteration"). There are also useful units with multiple slashes, such as bytes/CPU/sec. You could represent this as bytes/(CPU*sec), but this goes one step deeper into the rabbit hole of specifying unit syntax, which we're trying to avoid at least for now. I've been using some of these more complex units in the GC benchmark suite I've been putting together (see metrics.go).

This comment has been minimized.

@cespare, interesting. I think I understand. This is the difference between dimensions and units: one tells you what's being measured (e.g., "time"), while the other relates a value to a specific physical quantity (e.g., "minutes", and "seconds").

I've actually been fighting with this in a general unit parser/renormalizer I wrote for benchstat, since benchstat wants to show a dimension for its own column headers, and values with nicely (and differently) scaled units in the rows. Specifying it explicitly feels almost unnecessary because you can usually derive the dimension from the unit, but you're right that some of the units used by go test are quite vague. Arguably, these units are bad: any two numbers in the same or conformable units should be comparable, but "B/op" could many any number of different things that aren't necessary comparable. It might be possible to always derive reasonable dimensions from "better" units, such as "bytes allocated/op".

So, another solution is to say that we know what ns/op, MB/s, B/op, and allocs/op mean, but any custom metrics have to have "better" units.

This comment has been minimized.

This comment has been minimized.

I just encountered one annoying problem with this format. If there are no tests (such as in the go1 benchmarks), the testing package prints "testing: warning: no tests to run", which this format interprets as a configuration line with key "testing" and value "warning: no tests to run".

This comment has been minimized.

@bradfitz, to be clear, the currently proposed JSON format doesn't capture most of what's interesting about this proposal (custom metrics and benchmark metadata), and according to #2981 (comment), it seems these things are out of scope for the current JSON proposal.

Not that there's any harm in closing this issue. We're already using this format. But it's not a dup of #2981 since it covers different ground.

This comment has been minimized.

I agree—not a dup. There's a lot happening here beyond mere encoding. And it's not clear to me how the JSON proposal handles the header lines discussed here, which are not tied to specific benchmark runs. Re-opening.

This comment has been minimized.

Then I'm marking this as "Proposal-Accepted" so it doesn't come up in future proposal review searches. This is affected accepted because you're all using it anyway. I still think there's value in a standard format like JSON, but that discussion can continue in #2981.

The benchmark format proposal was written in parallel with the
sub-benchmarks support and the two didn't quite agree on the format of
benchmark names.
This commit makes two changes to the benchmark format document so that
it agrees with the practice and implementation of sub-benchmarks in
the testing package:
* It replaces key:value in the benchmark name with key=value, which is
the format suggested by the sub-benchmarks documentation.
* It removes the special rule about reporting gomaxprocs as a special
key=value pair. This was never implemented in the testing package,
so it still emits a -N suffix for the gomaxprocs value. Technically
this syntax is ambiguous, but it's what we're doing.
Updates golang/go#14313.
Change-Id: I67cbd735de7ab145bc7c3a93b5dadbe05120929c
Reviewed-on: https://go-review.googlesource.com/33570
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>