The first step in building a beowulf is studying the task you wish
to use the beowulf to speed2.2. It really shouldn't be that surprising
that the intended function dictates the optimal design, but newbies
joining the beowulf list almost invariably get it wrong and begin asking
``What hardware should I buy?'' which sort of answers itself as the last
step of this protocol. The One True Secret to building a successful
beowulf, as recited over and over again on the beowulf list by virtually
every ``expert'' on the list2.3 is to study your problem and code long
and hard before shopping for hardware and putting together a plan
for your beowulf.

This ``secret'' is not intended to minimize the importance of
understanding the node and network hardware. Indeed, a large fraction
of this book is devoted to helping you understand hardware performance
issues so you can make sane, informed, cost-beneficial choices.
However, it is impossible to estimate how hardware will perform on your code without studying your code, preferrably by running it
on the hardware you are considering for your nodes. In later chapters,
concrete examples of code will be given that run at very different
relative speeds on a selection of the currently available
hardware2.4.

The word ``study'' is used quite deliberately. It means to, if at all
possible, use measurements and prototyping more than
back-of-the-envelope estimates2.5. Measurements are far more valuable
than any theoretical estimate, however well-informed. A small prototype
can save you from all sorts of terrible mistakes, and when a
``successful'' prototype is finally built, it can can often be scaled up
to the final size required2.6

The ``study your code'' formula above brings to mind a vision of a
pocket-protector-loaded geek poring over line after line of program text
on green and white lineprinter paper in a dark smoky room with a can of
Jolt cola in one hand and a programmer's reference in the other. At
least to Old Guys like me. However, this is not at all what I meant. I
actually meant one to visualize a pocket-protector-loaded geek poring
over line after line of program text in a smoke-free modern linux
programming environment with minimally X (and a whole bunch of window
panels and desktops), gcc and friends, a debugger or two, emacsoid
editors, and/or a ddd-like integrated program environment, with a can of
Jolt cola in one hand and the keyboard in the other. Real programmer
geeks don't need a hardcopy language reference. That was what should
have given it away.

The point is, that to quantitatively study your code you have to
get serious with some of the software development tools that you
may well have largely ignored before. I should also point out that even
beyond just studying your code, you have to to study your task.
Even if you have implemented your task in a perfectly straightforward
piece of code that a lobotomized lunatic could read and understand, it
may be poorly organized to run in a parallel environment. On the other
hand, some horribly convoluted rearrangement of the code that you'd
never in a million years write in a single-threaded environment (and
that a non-lobotomized certified genius might have difficulty
understanding) may be just great in a parallel computing environment.

I will now and henceforth assume that you know nothing about
parallel code design or parallel task execution. Since I (truly) don't
know that much more than nothing, I'm going to try to teach you what
little I know, and where to learn more. Accept the fact that if you
have a ``big'' project in mind, you will have to learn more. I
mean it. Real Parallel Algorithms are the purview of Real Computer
Scientists (where I am a ``Sears'' computer scientist at
best2.7) and you'll need
to find a book by a real computer scientist or two to learn about them.
A number of such books are listed in the Bibliography and indicated in
the text in context. Alternatively, you can hire a real computer
scientist, if can get approval from your fire marshall and the local
board of health2.8.

Once you have a linux-workstation set up to do the requisite study you
can either design a program from scratch to be parallel (a great idea
when possible) or, more likely, take an existing serial program and
start to parallelize it. To parallelize the program and to inform the
beowulf design process, you must begin by identifying how much time is
being spent in a serial code description of the task doing work that
could be done in parallel and how much time is being spend doing
work that must be done serially. If the linux workstation you are
working on is at all ``like'' what you think you might need for a node
(after reading through this whole book) so much the better.

In all likelihood you have no idea how long it takes for your (or any)
computer to do any of the work in your task. Neither do I. So we
must find out. This is accomplished by profiling your task. The
way to profile a simple serial task using Gnu tools (gcc and gprof) is
illustrated in detail in a chapter below.

I'M WRITING RIGHT HERE - THE REST OF THIS CHAPTER IS IN TOTAL FLUX...

Task profiling is
covered in a chapter below.

Use Amdahl's Law (covered in a chapter of its own) to determine
whether or not there is any point in proceeding further. If not,
quit. Your task(s) runs optimally on a single processor, and all you
get to choose is which of the various single processors to buy. This
book can still help (although it isn't its primary purpose) - check
out the chapter on ``node hardware'' to learn about benchmarking
numerical performance.