Prevarication, Damn Lies, And Benchmarks

Many, many moons ago, I was director of PC Labs at PC Magazine. I worked there with Joe Desposito, Electronic Design’s editor-in-chief. During our time there, we worked on the first PC and local-area network (LAN) benchmarks.

PC Magazine is only available online these days, but it was one of the biggest computer print publications in its heyday. It was also the leader in benchmarking. One of the many challenges we always faced was that writers and editors wanted a single number to highlight in an article instead of the more detailed reports that the benchmarks could provide.

A single number is easy to talk about and compare. Discussions about details like repeatability, significant digits, and other important topics were often left to the wayside given the limited amount of space in the print publication. Modem-based bulletin-board systems offered a way to provide the underlying data, but this was well before the Internet.

Embedded Benchmarking

I recently spoke with Markus Levy, president of the Embedded Microprocessor Benchmark Consortium (EEMBC), about the Consortium’s new CoreMark. EEMBC’s bread and butter isn’t CoreMark but more extensive and targeted embedded benchmarks.

These benchmarks are often developed for specific product areas such as automotive applications or office automation. Access to most of the benchmarks is through membership in the consortium.
There are various rules and regulations for generating and using EEMBC benchmarks and results. Results can be used for internal evaluation, and EEMBC can certify results. Like the old PC Magazine benchmarks, it is possible to run EEMBC benchmarks on your own hardware.

Unlike the PC Magazine benchmarks, the EEMBC benchmarks are source code and designed to provide a range of information. They can be tweaked, but only from a tester’s perspective for analysis purposes. Certified results are for unmodified benchmarks.

The CoreMark deviates from EEMBC’s typical benchmark suite in two areas. First, it is available for free. That includes the source code. Second, in these days of multicore chips, CoreMark targets single cores. While it can run on a multicore chip, surprise, it reports linear speed up since it is designed to work on a single core.

Additionally, the CoreMark hits the low end of the spectrum. It is suitable even for 8-bit platforms, whereas many of EEMBC’s benchmarks are for higher-end systems.

Besting Benchmarks

The one thing I don’t like about CoreMark is that is presents a single number. It actually generates more information by performing a mix of tests, but it is really this information that would be most useful to a developer.

The challenge at PC Magazine or for any benchmark developer is to put together tests that are both meaningful and accurate. Repeatability is not necessarily a guarantee that a benchmark is generating anything but numbers. Likewise, benchmarks must be changing to match the hardware and software they are testing.

Sometimes vendors try to build hardware or software to optimize benchmark results. We ran into this issue with some video adapter vendors. Some drivers even checked to see if they were running benchmarks. They would then appear to run amazingly fast.

Perkin-Elmer, a minicomputer vendor from decades ago, published results from benchmark results where its Fortran compiler eliminated dead code, the benchmark in this case, so it ran in zero time. The results highlighted a good compiler but said nothing about the minicomputer it was running on.

CoreMark and almost any benchmark needs to take this type of thing into account. The quality of a compiler will affect results, so comparing compiler A and chip A with compiler B on chip B may provide significantly different results than using compilers C and D.

Comparing PCs was a bit easier since the hardware was x86 compatible, allowing the same benchmark code to run on different platforms. Embedded developers rarely have the option, although benchmarks can provide valuable insight when they’re used to evaluate hardware designed for upward migration.

CoreMark looks to be a good match for developers that approach it with an open mind and plan to analyze all the results, not just the final number.

So are benchmarks like CoreMark useful? Yes, but with the usual caveats. It will be interesting to see whether the target audience of programmers and engineers comprises better benchmark judges than PC Magazine consumers. Drop me a note telling me about your experience with CoreMark.