Benchmarking Linux

In preparation for the Ultimate Linux Box
print article, I've been doing some work with benchmarks. As it
turns out, Linux Journal Chief Geek Dan Wilder
has been working with benchmarks as well. I thought that instead of
re-inventing the wheel, we should combine efforts and give folks
insight into what we're contributing to the process.

Benchmarking something like the Ultimate Linux Box covers a
whole lot of ground. Disk I/O, CPU speed, graphics capability and
even how loud the computer is figure into benchmarking. What Dan
was working on was mostly server-based. I'm adding a graphics test,
a threaded I/O test and, although it isn't software-based, an
ambient sound test.

The first test is actually something that's been around
informally since Linus was a kid in Helsinki: How fast can we
compile the kernel? This is a fairly CPU-intensive test but
requires enough I/O to give you a good idea of how fast a box runs
under a development-type load, which is something you'd be doing
with a workstation. It's also a time-honored smoke test amongst us
in the old guard; if a box can stand compiling the kernel in a loop
overnight, it's usually good to go. There are more strenuous system
exercisers, but this is the easiest to set up; if it's Linux, it by
definition comes with kernel source. Of course, source or even a
compiler isn't always installed on a production system, but you're
not going to run this in production, now, are you? We're using
Linux 2.4.20 and a canned .config file for right now; I intend to
add multiprocessor support to the compile before running the final
benchmark.

Next up is bonnie++ 1.03a. Bonnie++ is a good basic benchmark
aimed at small-to-medium-sized I/O, using semaphores for
interprocess communication. It measures sequential input,
sequential output and random seeks. It makes an effort to defeat
the OS's attempts to cache data in order to get at the raw
filesystem I/O speed.

At this point, I'm adding in tiobench. This is another
filesystem benchmark with three major differences from bonnie++:
it's threaded instead of using semaphores, as many modern programs
are; it runs with a larger file size, approaching 2GB as opposed to
200MB; and it does not attempt to defeat OS caching outright,
although I generally keep it from coming up with ridiculous numbers
by limiting available RAM to 1GB. tiobench is a good large-system
benchmark as opposed to bonnie++, which would give you a better
idea of what a mail or news server that handles small files all the
time would do. Not everyone runs the same kinds of things on their
Linux boxes, so I felt we should present both benchmarks and, in so
doing, explore a full range of performance.

The fourth benchmark is PostgreSQL's regression tests.
Instead of relying on having PostgreSQL installed on the machine,
we simply include the source in the tarball, as we do for the
kernel, bonnie++ and tiobench tests. We then compile PostgreSQL
from scratch, include the regression suite and simply run the
database tests right there from the source tree. Although the
regression tests run on relatively small database files, it should
give you some indication about how the system performs on large
file. As Dan said, the best benchmark for a
given task is the task itself or as close a simulation as you can
come up with. What we're trying to do here is get you in the
ballpark, not put the pitch over the outside corner. We're using
PostgreSQL 7.3.2, which is not quite bleeding edge but currently is
only one minor version down.

Now for test five, graphics. Currently I'm doing something
similar to the classic Quake III: Arena tests
the Windows testers love, only with a GPL game,
Chromium. This little shooter does a nice job
of putting the frame rate up in the corner of your screen, and it
also logs it to stderr. This makes it easy enough to use Perl (or
awk/grep/sed, if you're old school like me) to grab the fps ratings
and find the average. But it's a somewhat ungainly monster to put
in a source tarball. I'm thinking we may replace it with something
IBM wrote called viewperf, which
is designed to be a benchmark out of the box. I want to do some
more research, however, on the topics of licensing and required
libraries. Viewperf is, as are most Open Source projects, a work in
progress.

Test six involves no software at all, unless you count what's
on the microprocessor in the sound meter. This is a fairly simple
ambient noise check. We're currently using a CheckMate CM100 sound
pressure level meter from Galaxy Audio. We set it to dBa fast mode,
and simply measured the sound coming from the machine in a
relatively quiet (<40db) room. We measure at 10" from the center
of the front of the case and likewise from the back, where it's
usually the noisiest. We then measure 24" from the center of the
top front edge of the case, which is about where the operator's ear
would be, assuming the case is in its usual on-the-floor position
(for a tower case). We take an average level, generating the
averages with the industry-standard Mark IV Eyeball. If the machine
has temperature sensitive fans, we measure both a warmed-up idle
machine (on for 15 minutes or so, so the temperature has
stabilized) and a machine under full load (like, say, the kernel
compile loop), again allowing time for the temperature and noise to
stabilize. We added this test after getting initial feedback on the
ULB Case Study article concerning the tendency of high-end machines
to sound like the proverbial jet engine.

The idea here is for tests 1-5 five to be distributed in one
large tarball, with a top-level Makefile that compiles, configures
and runs the entire suite--during which time you could run test six
for the machine under load. Ideally the entire suite would be
OSI-compliant. The kernel, obviously, as well as bonnie++ and
tiobench are GPL; Chromium is under the Artistic License and
PostgreSQL is released BSD-style. At press time I don't have the
terms for viewperf, which may affect our decision to use it. The
glue code that runs the suite is, of course, GPL. Feedback on the
tests we use is, of course, welcome. Once the entire suite of
benchmark tests are glued together, we'll let you know where to
grab it and exactly what other software you'll need to run it. We
will include as much of the sources for what you need as we can,
but some things, such as the PAM and readline development
libraries, really should be the system versions. On the other hand,
including specific versions of source for the major programs the
benchmark will run ensures the integrity of the numbers we
produce.

I'm also going to be looking for a name for this monster. Dan
has a working name for it, but I think we'll want something that
looks respectable when we put the numbers in front of some C-level
executive that doesn't grok the sense of humor inherent in Linux
and its ancestors. Feedback, again, is welcome, but keep in mind
that our target audience will be non-geek as well as geek.

Watch this space for more news as we continue to develop this
project.

Glenn Stone is a Red Hat
Certified Engineer, sysadmin, technical writer, cover model and
general Linux flunkie. He has been hand-building computers for fun
and profit since 1999, and he is a happy denizen of the Pacific
Northwest.

Comments

Comment viewing options

Benchmarking is of particular interest to me. I'm in the process of figuring out how to benchmark various methods of MPEG4 encoding in software, on various linux kernels and processors. Also, I'm trying to develop bandwidth usage tests that will allow benchmarks of how much bandwidth the same MPEG4 streams actually use, across a LAN or WAN. Hopefully once this test platform is developed it will test various opensource impletations of MPEG4 and other video encoding schemes, and determine the resources(computational and communication) that are needed to support various encoding/streaming semantics. Finally we want to test a variety of network protocols for optimization of video delivery in various QOS and differentiated service schemes. Ideas are most welcome: (james@buffer.net)

Now this is a verry interesting article
But about graphics cards 3dlabs www.3dlabs.com have linuxsupport for their vp10 graphics cards so maybe it would be a good idea to have those card in mind when you have to build the ULB
Jesper Christensen