When I first learned big-O notation, I had similar woes. However I eventually realized that the idea behind big-O was to show how a particular technique would scale, (and thus, also which parts of a techinique needed to be written with speed-sensitivity). This is in contrast to providing the answer to which of algorithm is faster.

We could find some nice complex examples on PM (for instance ones involving hashes that I have generally just said "a little slower"), but your example is very simple, yet complex enough to show most of the following.

In your example you lament that two algorithms with the same big-O notation produced fairly different speeds. As I mentioned before, big-O notation is not meant to compare different algorithms, in terms of their absolute speed. Rather it is supposed to say something about how they scale.

For instance, since both algorithms have N*log N time, they should both scale in the same way. That means, if you double the number of items, the speed should be (2N*log 2N)/(N*log N) = 2 * (log(n) 2 + 1) (that is for large N, it will take closer to 2 times longer (sorry for the heavy use of math, also log(n) 2 is "log base N of 2")), or 2 about 2 times longer to execute. It also means that the ratio of their speed differences will be constant, so a 15% speed difference will stay at 15% no matter how large a dataset.

This in itself tells us two things. The first is that we don't have to use a really large value of N (only large enough to cancel any "noise") to know how much our speed ups or downs will cost for larger N's. Also it means that the only thing that we need worry to work on is the cost of our "constant-op", in this case, that code between the {} for the sort. All of this can be important in more complex situations and even in this one!

Say we hadn't heard of the idea of Schwartzian Transform. We could use big-O to give us a hint on what to look at in order to come to the same technique:

Since big-O told us we have to make the stuff in the {} faster, and since it's only one comparison, we don't really have much choice but to try to remove those function calls to pack. Well how can we do that? We have to have them or we wouldn't be comparing properly. Aha! What if we already had them?? Then the {} wouldn't have to do it, and we should be faster!

Well, at least, we hope so, because in fact our algorithm isn't really N*log N, it's 1 + N*log N. That is, we are paying a constant cost to send and prepare the data to our {} code.

Luckily we just have to send the {} a pointer, not copy the whole list or some other fancy stuff, which would change the 1 into a N or worse!

But opps, now that we have made {} short by removing the packs, we now have change a simple pointer hand off into "other fancy stuff". That is a N operational transform! (Well, N*M where M is the length of an item, but since that's constant....).

So now our total cost is N+N*log N, slightly "worse" than the previous code, according to big-O. But as we have said, big-O doesn't say how fast an algorithm will be against another, but rather how they scale.

What is hidden here is the hidden costs, the original code was: N*(x+y)*log N*(x+y) , and the new code is: (N*x + N*y*log N*y . We just move the cost X (the pack) out, leaving the cost Y (which is the cmp) still with in. X is much larger than Y, and since N*log N grows faster than N, we can see that by moving X out, we are saving more by paying for it only N times instead of N*log N times.

Confusing? Sorry, but that's why big-O doesn't try to take these sorts of things into account. Instead you (more likely) just say that both the new and old code was N*log N, since this is the part of both code's big-O that grows the fastes. Then you can say that the newer code is faster, because the cost for each N is smaller (no pack). You must append this statement with the words "for large N", just to be safe though :) .

Ok, so now you see why we don't worry about using big-O to tell which algorithm is faster (though you can use it as a tool to find out, using more detailed information). You can also now know you can use big-O notation to find where to concentrate your efforts when optimizing code.

Now for the finale, we'll look at why the second code is faster than the first. The first thing we can do is, by using big-O, notice that since both have the same big-O notation, it must be that the code inside the {} is costing more than the others.

Now, the first thing to notice is that they both pay the same cost to translate $a and $b into a list of four numbers. However they store the result differently. One creates arrays @a and @b, the other sends the results directly to a function.

This is one place for a speed increase, the first code can be made faster by saying:
my $aref = [$a=~/(\d+)\.(\d+)\.(\d+)\.(\d+)/];
Instead of allocating a new array and copying the contents, we now only allocate a reference and point to the already allocated anonymous array.

The next place to look is at the cmp vs. the <=> and ||'s. Since both create all of the list at once (using the regex) the short circuit logic of both the cmp (between bytes) and the <=> and ||'s is only keeping you from having to make at most four more comparisons (and no data processing). So we can first conclude that the savings for this part will probably be small. Looking further we see that cmp is a direct perl statement, whereas <=> and || are several combined, the cmp is probably faster in this case (but again, not by much).

The next thing is very subtle, you are using <=> in the first code, which forces strings to be converted to numbers before the are compared. This would mean that <=> vs a cmp would probably be slower (of course, simply moving to cmp would make the code incorrect). Of course these string conversions may or may not be equal to the cost of the pack() operation (not the cost of the call though).

The last thing to look at is what is left, the array lookups, and the pack() call. The pack call has to be costly, but it's operation itself is fast (and probably written in C). Array lookups are pretty much as fast as you can get, so the first code wins with this.

The conclusion is, you have to decide if memory allocation and deallocation is going to cost more than two function calls. I would personally go for the memory allocation being slower. This is because at worse, we have to ask the system itself for the memory, which can be very costly, even if it isn't likely. At best we have to ask perl for the memory, which is probably at least half the cost of a built in function like pack(). Also we then have to deallocate that same memory (with approximately the same cost).

Now to test out some modifications:

... or not. My heavily load SMP 2.4.0-test kernel machine mixed with the Bechmark module seems to equal, not good system times (which may be part of my problems with random crashes) I just got back negative times after waiting about a minute on a benchmark that was only supposed to run for 1 cpu second :P . Anyway just take my word for it! lol