My rule of thumb: if a function doesn't make code any easier to read, it's not worth writing.

At one point in my life (not very long ago ;)), I had the temptation to create lots of little 2-line and 3-line functions (probably coming from an OO background in school, where I was taught to make even simple variable accesses into object methods...)

It's clear in this case that the overhead involved in calling a function (mostly involving pushing variables to perl's stack and popping them back off) makes the code run roughly 3 times more slowly.
Generally, I try to avoid calls to "small" functions in inner loops whenever performance is anything of an issue; if I'm only reusing those few lines of code once or twice in a program, it's just not worth it to create a function for it, in my opinion.

To mangle Mark Twain: "There are four kinds of lies: Lies, Damn Lies, Statistics, and Benchmarks".

This is a good case for learning to use the Benchmark module. I found that creating an anonymous sub ref was slightly faster than a full sub, and that a bare block had less efficiency gains (although still outstanding for this simple function) the larger the sample size. If you are sensitive to milliseconds of difference, or have extremely complex algorithms you should measure them in situ to determine whether a bare block is better than a subroutine.

To me, the main routine should be not unlike an outline. Some works, like the standard five paragraph essay, don't really need this. For reference books or novels, on the other hand, it can be an essential aid to the writer.
You can always paste the text of a subroutine into other parts of the script for production code where milliseconds count. This will assist the compiler in streamlining the code (although repeated functions will make the whole executable larger and perhaps increase compile time if you are compiling for each script load). But when you are designing the code, it would make sense to abstract most of your larger blocks, just like you would abstract chapters, sections, and paragraph sets when writing.

I'm relatively new to perl (OK, a complete newbie), but after looking at the same program written in C, I got much different results.
330ms and 350ms user time. Is there a way in perl to force inlining of functions? Or maybe this just points out a place for improvements in the compiler...
(which, I understand, due to the interpreted nature of perl, is neccessarily minimal, but would this be a quick-fix job?)

Did you test the performance of the perl programs (vs. equivalent C code) on your machine? Chances are relatively decent that your machine is better than my clunky 133 MHz Pentium.

Fact correction: Perl is a compiled language, actually (well, as compiled as Java, anyhow). The perl program works by taking in your source file, and compiling it in several stages.

Stage 1: the compile phase. In this phase, Perl converts your program into a data structure called a "parse tree". If the compiler sees a BEGIN block or any "use" or "no" declarations, it quickly hands those off for evaluation by the interpreter.Stage 2: Code generation (optional). If you're running one of the compiler backend modules (such as B::Bytecode, B::C, or B::CC), the compiler then outputs Perl bytecodes (much like a Java .class file) or a standalone chunk of (very odd-looking) C code. These code-generators are all highly experimental at the present.Stage 3: Parse-tree reconstruction. If you did stage 2, stage 3 remakes the parse tree out of the Perl bytecodes or C opcodes. This speeds up execution, because running the bytecodes as-is owuld be a performance hit.Stage 4: The execution phase. The interpreter takes the parse tree and executes it.

This is Perl compilation in a nutshell... read Chapter 18 of the 3rd edition of Programming Perl for a more in-depth analysis.
For many tasks (especially simple ones such as these) Perl will be slower than C, because C is basically a more-portable form of assembly language, and assembly language (once actually assembled) works with raw hardware, and is hence as about fast as you can get.

Another difference between Perl and a native C app that may affect performance is the fact that Perl has its own stack (actually, it has several stacks) as opposed to a C program, which is likely to just use the system stack.

I add my voice to this: the should be a way to inline
functions and method calls.

Granted you can use the pre-processor (perl -P)
to inline functions but this does not work for method calls.
This is really bad when designing OO Perl, where I find
myself using straight hash access ($o->{field}) instead
of accessors ($o->field) for some often-called methods
(or writing painfull and risky kludges),
which makes maintenance much harder

I am actually very surprised this is not even a Perl 6
RFC, I would
think that this is a simple (I would think it is quite easy to
implement) way to enhance speed or maintainability of OO Perl programs.

Well, since it'd probably be just an attribute on a subroutine, and we already have those, there's nothing really to RFC. {grin}

Ideally, the Perl compiler would just recognize when a subroutine qualifies as
an open-coded version, and "do the right thing". I'd prefer that. Just like how
certain subroutines are recognized as "constant" now, and pre-evaluated.