Apple's demise

Who says Apple is Doomed? I DO!

In the News

Cupertino -- we have a problem

05/06/06
Apple is now using good CPUs to run
their machines, and apparently in the latest Mac OS X they are now
accelerating 2D graphics more fully (bringing them on par to where PC were 12
years ago.) So everything ought to be hunky dory right? Well not quite. It
appears that the system developers at Apple are so isolated from reality that
they have completely missed important OS architecture trends.

Jasjeet Sekhon has run some benchmarks of the statistical programming language
R, and things do not look very good for the Mac. Click on the link below
for more information:

Declaring victory in defeat

01/17/06
So at this past MacWorld Expo, Jobs unveils the first x86 based Mac product
that basically should be shipping right now. Apparently this is early
relative to their publically announced roadmap or something (the Mac rumors
retards was thinking that Apple was going to become a Television vendor --
I'm not even going to guess at what their reasoning is for that.) Jobs also
showed some ipodding software, some simple webpage authoring app etc. The
point is that he ran the demo on the x86 Mac. Leo Laporte from the "This Week
in Tech" podcast made the important point that all the applications running on
the x86 based Mac seemed to run ridiculously faster on the x86 than even his
dual PowerPC G5 tower. Now here's the thing -- on Windows its extermely hard
to notice CPU upgrade performance from running mere desktop applications (you
usually can only notice when running video games, compressing video, or
similar kinds of applications). The reason is that the CPU has basically
been delivering "essentially infinite" performance relative to desktop
applications needs on x86 for a long long time now. But for a Mac person this
is probably the first time they have ever seen this kind of performance or
anything like it. The deluded Macaholics don't have the functioning neurons
necessary to fill in the rest of the reasoning themselves so let me just fill
them in: The PowerPC is, and has always been pure silicon diarrhea. You were
being LIED TO repeatedly and over a long period of time. They were
very very bad CPUs. Whatever. But like the typical battered wife with no
self esteem, the Mac faithful will go back to Jobs, and forgive his lies, and
just scoop up whatever he says from now on.

Continuing with the "The Week In Tech" podcast (TWiT#38) which discussed the
MacWorld Expo, John C. Dvorak put out the following prediction: "Within 2
years, Apple will switch to AMD". I'll take even money on the other side of
that bet. The AMD processors are currently objectively faster than Intel's
-- so that's the source of his reasoning. But what he's missing is that that's
precisely why Apple won't do it. Let me explain. Intel is in
desperate trouble with their CPUs. AMD is basically faster, cheaper, and at
least as reliable as Intel across the board. In fact Intel has been losing a
noticable amount of marketshare to AMD as a result.

So one way for Intel to deal with this is to seek to expand their market. But
what would be the point if they just end up ceeding a large portion of that
share to AMD? Apple did not just buy a CPU from Intel. They bought a
motherboard design, chipset and (drum roll please) a compiler.
Apple is concerned about making a complete system and probably want a more
consistent road map than IBM gave them. Intel is looking for another
exclusive partner (besides Dell) to fend off AMD. While the details of the
deal are obviously a closed-door affair, its fairly obvious to me that Intel
and Apple probably worked out an exclusive deal.

Checkmate!

06/11/05 That's it -- Apple's given up on
the PowerPC. And for those of you lamenting its inevitable passing (if we
ignore the Xbox 360) let me assure you -- it was never a very good processor
to begin with.

The theories about why that are being
bandied about are amusing to say the least, but I think for once in his life,
Steve Jobs chose not to lie. Or at least not to disguise it much. Its simple
folks -- IBM and Freescale/Motorola failed to deliver. And its not just
performance per watt (liquid cooling anyone?) its performance
and power. At the keynote of the recent Apple developer
conference Jobs made sure the dig at IBM's failure to deliver did not go
unnoticed, even if he avoided mentioning them by name. No 3.0Ghz (after it
was promised) and no G5 in the power books. All this at a time when Mac
sales apparently are showing slight signs of pickup (indeed all sorts of
people are coming out of woodwork and buying Macs!). So much for thinking
different.

Now, of course, we can discuss the real
reason they went with Intel. First note that they did not go with
x86, but rather with Intel; that is to say, they are
not going with AMD. Why is this? Simple; their problem is
(1) delivery of (2) high performance and (3) low power
micro-processors. IBM failed on #2 and #3. AMD rules on #2 and #3, but they
own only one fab, so they can't provide the guarantees Apple is
looking for on #1 (though AMD has recently had very consistent delivery for
the past 7 years, this is based on a smaller marketshare, and follows a horrible
delivery record for the K5s and initial K6s.) Intel is not so good on #3, and
lags a bit on #2, however, in both cases, the problems are not as extreme as
they are with IBM -- and of course they have more than proven their ability
to deal with #1. And Intel comes with a secret bonus -- they make their own
compiler, which kicks the crap out of gcc and basically other compilers in
general. Its the same reason Microsoft went with Intel for the original Xbox
CPU, and not with AMD. This part is way over the heads of your typical Mac
weenies. Choosing AMD would seem more in-line with Apple's "Think Different"
mantra -- but it would still be a risk for Apple, which is the primary thing
they are trying to avoid right now.

Another problem Apple has had to contend with is
that IBM/Motorolo did not face direct competition even with each other for
Apple's business. They always found themselves conveniently trading off the
market segments. By not directly competing, there was never any market ->
product pressure which Apple could leverage to ensure they had the best
possible processors. Even if Apple commits to an Intel only solution, they
benefit from competition from AMD, since Intel has to deal with that
competitive pressure no matter what (because Intel sells its processors to
vendors other than Apple). See how that works? Marketplace
competition leading to a better product? After they destroyed Xponential,
there just simply was no competition for the Mac CPU. Basically, at long
last, they got it.

I have a confession to make. I never thought this
would happen. Not because I thought it was the wrong thing for Apple -- au
contraire, they should have done this a long time ago. But its precisely
because it was the right thing for them to do, that I never thought
they would do it. Apple's slogan has always been to "Think Different" -- a
slogan they used to hide behind to avoid direct technical comparisons with
cheaper, faster, more compatible, and scalable PCs. And now they are stuck.
They can't win any rigged benchmarks anymore -- in fact, they are going to
start losing them all, undisputably, to AMD based machines. Worse yet, they
have a legacy PowerPC -> x86 transition period that they are going to have
to deal with. Their "Rosetta" story sounds nice, but my suspicion is that
this is going to be a lot more painful than they are suggesting.

So what did I miss in my thinking? Simple: the
internal pressure for Apple to switch must have been enormous. And Apple
being such a closed company, that was something hard to gage from the outside.
This is why so few Apple serious pundits saw this move. Yet a few people like
Dvorak did see it. Its not hard to see how a few leaks happened; Microsoft,
Adobe, ATI, nVidia, and a few other key 3rd party software developers clearly
have known this was going to happen for at least 6 months -- the fact that
Darwin was always compilable on x86 has been known in the developer community
for even longer. I also always assumed that Steve Jobs reality distortion
field also extended to the inside of Apple.

Here are a few comments for the stoners who can't
read this situation properly:

Apple will not just switch to AMD at the drop of a hat as
necessary. Clearly, to milk this for what its worth, Apple needs
compiler assistance from Intel (something Intel can definately
deliver on) and Intel would be reluctant to do so without at least some
sort of commitment from them. AMD's architecture includes a built-in
NorthBridge, and uses the point to point Hypertransport as its Bus protocol,
while Intel has an external NorthBridge and a shared Bus architecture which
means the motherboard designs for each are not even close to each other.
I.e., AMD and Intel designs are actually not that similar from a hardware
design point of view.

Why was CELL not chosen? Simple -- CELL is NOT a microprocessor.
Its a corprocessor extension architecture that is meant to be grafted onto
a host processor (such as a PowerPC.) Its major benefit is for systems that
don't already include coprocessors for graphics (i.e., something like a game
console) or something with specific DSP-like needs (like a software based
modem, or soundcard -- again something suitable for a game console.) Using
it for computing needs, such as might be used in Photoshop is possible, but
requires a complete recoding which Adobe might not feel like doing on Apple's
schedule (now that they make most of their revenues from PC-based sales.)
Also, where does CELL sit on ramping clock rates, power consumption and
reliable manufacturing? Too much of a risk for a so far undelivered
product.

This cannot be related to the Xbox 360 deal. IBM would not have any trouble
ramping up production to sell both to Apple and Microsoft. Also keep in
mind that IBM will not be trying to sell constantly improving generations
of PPC CPUs to Microsoft -- once the architecture is locked in, that's it,
they can sell it for years past its technological obsolescence. That was
kind of the point. The PowerPCs that Apple is interested in would not
follow that line of design/manufacturing.

Alternative View on iTunes

Some more G5 dinging

08/20/03 Apple has continued issue its false
advertising ... so I'll continue to ding them. Apple has also claimed that
they are the first to ship a 64bit desktop machine. While the Alpha PC
claim is a bit of an oddity, the Opteron based BOXX that shipped on June 4th
of this year is not. The guys at digitalvideoediting.com called them
on it:

When any organization or individual makes public claims using SPEC benchmark results, SPEC requires that the following guidelines be observed:

Reference is made to the SPEC trademark. Such reference may be included in a notes section with other trademark references (see http://www.spec.org/spec/trademarks.html for all SPEC trademarks and service marks).

The SPEC web site (http://www.spec.org) or a suitable sub page is noted as the source for more information.

If competitive comparisons are made the following rules apply:

the results compared must utilize SPEC metrics and be compliant with that SPEC benchmark's run and reporting rules,

the basis for comparison must be stated,

the source of the competitive data must be stated, and the licensee (tester) must be identified or be clearly identifiable from the source,

the date competitive data was retrieved must be stated,

all data used in comparisons must be publicly available (from SPEC or elsewhere)

Comparisons with or between non-compliant test results can only be made within academic or research documents where the deviations from the rules for any non-compliant results have been disclosed.

Please see the specific Fair Use rules for each group (GPC, HPG, and OSG) for any additional and specific requirements

When Steve Jobs made his original G5 presentation, he mentioned and showed
graphs with SPEC, SPECfp, SPECint and SPECrate. But he did not establish a
basis for comparison. He simply asserted the results. This is a violaton of
3-2. He also did not make mention of Veritest during his presentation (it
was later discovered through some asterisk on Apple's website) which is a
violation of 3-3.

Whether or not there is a violation of 4 is unknown. Since Veritest has not
submitted the results to Spec, we cannot say for sure if they would survive
scrutiny. (In prior discussion with Spec committee members, I have been told
that results which appear hostile towards a vendor would be rejected
-- Veritests results for the Dell machine look woefully unacceptable, and
remember that since Apple was making the comparison with the results on the
Dell, those figures must be in compliance as well.)

Notice that Dell does not have SPECrate results available the non-Xeon
machines. Its not because they are afraid of the results, but rather because
they are not sold in multiprocessor configurations. Hyperthreading is
not meant to be measured as providing the performance of an addition
processor (especially since there isn't an extra processor being used
...) Hyperthreading provides a virtual extra processor whose purpose
is to slightly increase performance from essentially rebalancing of
multithreaded applications (so stalls while waiting for resources can be
mitigated.) I.e., you are not supposed to run SPECrate on non-Xeon machines,
since it doesn't make any sense.

Incorrect analysis on osnews.com

07/28/03 The guys over at OS news usually do a
lot of good work by creating articles from the perspective outside of the
narrow view of Windows. But unfortunately in absence of a mechanism for deep
analysis, they have found themselves susceptible to Macaholic psychobabble.
Their article entitled Analysis: x86 Vs PPC is the one causing me distress. I'm
just going to go straight into debunking mode.

Both the Athlon and Pentium 4 use longer pipelines (long and thin)
with simple stages [...]

The Athlon pipeline is actually very wide. The Athlon is capable of
simultaneously decoding three loads, three stores and three ALU operations
per clock (it can actually only issue two of the memory operations per
clock, thus limiting it to about 5-operations per clock.) While the Pentium
4 is certainly thinner than the Athlon (limited to 3 "micro-ops" per clock)
its not fair to call that thin. For example, the PPC 970 and Pentium 4 have
the same integer execution bandwidth (2 ops per clock.)

"Each component of a computer system contributes delay to the system
If you make a single component of the system infinitely fast... ...system
throughput will still exhibit the combined delays of the other components."
[...] The reason for the lack of scaling is the fact that memory performance
has not scaled with the CPU so the CPU is sitting doing nothing for much of
it's time

This is a general observation that has been pointed out by folks like Dr. John D.
McCalpin, however to use this argument against the Pentium 4 is
completely unfounded. In combination with Rambus memory (which is very
expensive, but nonetheless rather amazing in terms of raw bandwidth.)
Another thing to notice is that although DDRAM is technically slower than
RDRAM, benchmarks of real world applications show that there simply isn't a
significant difference between the two. I.e., it is fair to say that Intel
and AMD (since they have managed to keep up with Intel's performance in
general) have made appropriate trade-offs in their respective architectures
to make sure that they aren't completely bottlenecked by memory bandwidth.

Since AMD began competing effectively with Intel in the late 1990s
both Intel and AMD have been aggressively developing new faster x86 CPUs.
This has lead them to becoming competitive with and sometimes even exceeding
the performance of RISC CPUs (If you believe the benchmarks, see below).
However RISC vendors are now becoming aware of this threat and are responding
by making faster CPUs.

The premise is correct, the conclusion is utter nonsense. There is no more
Alpha, HP will be phasing it out. In addition the PA-RISC will be phased out
in favor of Itanium (a non-RISC VLIW architecture from Intel.) MIPS is dead.
UltraSparc has tail-ender performance -- ironically the only reasonable
modern processor that they soundly beat is the Motorola G4. HAL/Fujistu
hasn't really proven anything recently with their Sparc clone. Motorola is a
complete joke. There's only one credible RISC vendor left who can challenge
x86 on the desktop, and that's IBM with its PPC 970 and Power 4 processors.

Both x86 and PowerPC have added extensions to support Vector
instructions. x86 started with MMX, MMX2 then SSE and SSE2. These have 8 128
bit registers but operations cannot generally be executed at the same time as
floating point instructions.

There is no MMX2. Floating point and SSE/SSE-2 can be executed
simultaneously with no restrictions, and on the Athlon, the FPU's reordering
mechanisms is so powerful that it can actually take MMX and FPU instructions
which are seperated in software only and execute them simultaneously
anyway (its possible the Pentium 4 can also do this, however, I am not as
familliar with its pipeline limitations). That said, MMX covers integer
vector operations (best for memory sensitive audio or video processing),
which generally is not overlapped with floating point operations. This is
just misrepresenation.

However the x86 floating point unit is notoriously weak [...]

What?!?! As compared to what? x86 floating point has been consistently
crushing PPC's on floating point performance for nearly half a decade
now. The x86 floating point uses a convoluted instruction architecture, but
modern x86 micro-architectures have been up to the task of working around
these complications. Other RISC processors (with the exception of the PPCs
from Motorola) have beaten x86s on Spec CPU FP mostly because of over the
top (and very expensive) multi-bank memory architectures (that's correct
-- Spec FP is a fairly memory limited benchmark). As Intel and AMD have
worked hard on their memory infrastructures, and have now both introduced
multi-bank memory architectures of their own, they have both been dominating
Spec FP lately (Intel more so, as they have concentrated on memory bandwidth
somewhat more.)

Decoding the x86 instruction set is not going to be a simple
operation, especially if you want to do it fast. How for instance does a CPU
know where the next instruction is if the instructions are different lengths?
It could be found by decoding the first instruction and getting it's length
but this takes time and imposes a performance bottleneck. It could of course
be done in parallel, guess where the instructions might be and get all
possibilities, once the first is decoded you pick the right one and drop the
incorrect ones.

Don't quit your day job dude -- both Intel and AMD use cached
predecoding mechanisms. I.e., the first time they see
instructions they decode them (slowly) and remember enough about the
instructions so that the next time they decode them they can do so in
parallel with very high performance. The Athlon stores and uses predecode
instruction boundary, and branch bits while the Pentium 4 uses a full trace
cache (i.e., the instructions are decoded into fixed width micro-ops which are
cached instead of raw x86 opcodes.)

Once you have the instructions in simpler "RISC like" format they
should run just as fast - or should they? Remember that the x86 only has 8
registers, this makes life complicated for the execution core in an x86 CPU.
x86 execution cores use the same techniques as RISC CPUs but the limited
number of registers will prove problematic.

The x86 => macro-op/micro-op mechanisms translate straight to rename
registers. I.e., false dependencies created by reuse of the 8 registers are
automatically removed. For example, the Athlon has 88 internal floating point
registers, with 36 destinations. In general, this gives the Athlon macro-ops
the equivalent freedom of 36 registers. Modern x86s can internally unroll
loops from multiple iterations and will create, essentially, "cloned
registers" to make it execute as if there are more registers being used.

This narrow point also ignores the fact that x86's have far more advanced
addressing modes than most RISC processors. This allows x86 instructions to
fetch memory operands in the same x86 instruction as an ALU operation.
Because of the sophisticated fully pipelined and out-of-order mechanisms
inside the x86 architectures, this can allow arbitrary access of the L1 cache
for read-only parameters with virtually no-penalty. This dramatically reduces
the need for registers versus comparable RISC instruction sets.

However when the x86 includes this kind of hardware the 8 registers
becomes a problem. In order to perform OOO execution, program flow has to be
tracked ahead to find instructions which can be executed differently from
their normal order without messing up the logic of the program. In x86 this
means the 8 registers may need to be renamed many times and this requires
complex tracking logic. RISC wins out here again because of it's larger
number of registers. Less renaming will be necessary because of the larger
number of registers so less hardware is required to do register usage
tracking.

As complicated as it is, it is present and accounted for in both leading x86
architectures. In fact, the Athlon uses an always rename
policy with an implicit naming algorithm which imposes absolutely no overhead.
The Pentium 4 uses a single rename stage, but otherwise has comparable rename
capabilities (Intel unifies both FP and Integer rename registers, while AMD
splits them.)

Let us also note that the PPC 970 also has very heavy renaming machanisms
as well. This is very necessary because of its two-cycle minimum latency on
integer instructions -- the CPU needs to find any parallelism when the
compiler cannot. I cannot remember the details, but my recollection is that
its renaming capabilities are more comparable to the x86s (in comparison to
the G4 which has pathetic rename capabilities). So there's nothing to win,
since it costs nothing, and the PPC 970 doesn't win anyway, since it has and
is forced to use a similar mechanism.

AMDs x86-64 instruction set extensions give the architecture
additional registers and an additional addressing mode but at the same time
remove some of the older modes and instructions. This should simplify things
a bit and increase performance but the compatibility with the x86 instruction
set will still hold back it's potential performance.

The author has not established any basis for such a comment. The reality is
that the Opteron is simply an amazing architecture that does not appear to be
held back by anything.
The ace's FLOPS test is NOT a comprehensive test of compilers
that is comparable to Spec CPU. FLOPS is a benchmark which measures floating
point performance, whose simplicity gives the compiler the best opportunity
to map the algorithms to the best possible sequence of instructions for the
CPU. The test is too narrow to give a complete assessment of a compiler's
capabilities. It was originally used to test Intel's claim that "compilers
in the future would demonstrate the Pentium 4's superior floating point
performance".

Numerous other benchmarks on the web which do purport to
be more comprehensive of general compiler performance are indicative that
Intel's compiler is truly superior to any other x86 compiler by a fairly
robust margin (typically between 10 to 25% on average versus the next best
compiler.)

"Intel's chips perform disproportionately well on SPEC's tests
because Intel has optimised its compiler for such tests"[13] - Peter
Glaskowsky, editor-in-chief of Microprocessor Report.

This is a very old comment based on old versions of the Intel compiler. The
reality is that Intel has invested quite heavily in their compiler technology
for the purpose of building truly world class compilers, plain and simple. As
a result, Intel will do very well on Spec benchmarks, and just about any other
benchmark which can be recompiled with their compiler because their compiler
is just an excellent piece of engineering.

For other comments about Apple's recent benchmark claims with the release of
the G5, just read the story below, and the one below that.

G5/PPC970 redux

07/18/03 Ok, now that the Apple people
themselves have gotten into trying to FUD the anti-FUD going on, it looks like
I need to get into this myself.
Here is the official Veritest report that was even
further distorted by Apple in their false presentation of the G5 as the
fastest system.

Veritest was formerly the imfamous ZD labs,
who are well known for their broken and long since discredited CPUMark (whose
results come substantially from one single x86 instruction) and FPUMark
(whose results have more to do with the branch prediction performance than
FPU performance) and many other useless benchmarks (all of which have
superior contemporary replacements). When the Athlon CPU was first released,
dozens of independent websites and review organizations were able to quickly
and prominantly determine that it has a far superior floating point
unit than the contemporary Pentium at the time. ZD Labs was one of the few
organizations that was unable to determine this. In a bit of back and forth
I did myself with one of their developers, challenging them on FPUMark, they
gave very unsatisfactory "we stand by what we did" kind of answers, and they
did not conceed that FPUMark was a very bad benchmark from its very inception
(they instead said it was inappropriate as a modern bechmark,
whereas I claim it always has been a bad benchmark.)

If one were to consider the reputation of the
actors involved, I would say that Apple and Veritest are in the same group of
having absolutely no crediblity whatsoever on claims of performance.

Let's start with the basic facts about the Spec
CPU claims:

Veritest's was hired by Apple to run the Spec CPU benchmarks for Apple.
Dell did not participate in any way (and was likely unaware of the test
taking place), however Apple was in a very tight loop, and worked with
Veritest on compiler switches, and other system modifications. Notice in the
Testing Methodology section of their test report they clearly did a major
amount of tweaking of the Apple system (using the CHUD utility to change the
caching policy, setting their prefetch mechanisms, etc), while doing very
little to the Dell system (besides turning off X, which would have little to
no impact)

Veritest's conclusions do not support the claim that Apple's box was the
fastest, as it still lost on the Spec CPU Int test. Note that these resuls
are far lower than the officially endorsed Spec committee results.

Although the the G5 won on Spec FP, these results show their largest
discrepancy from the official Spec endorsed vendor results (up to 90%(!) lower
than what Dell measured.) I.e., the test where reality has been distorted
the most is result that Apple emphasizes the most.

We have the additional problem of using the rather unusual NAGWare
Fortran 90 compiler. For a comprehensive analysis of Fortran compilers you
can go to the
Polyhedron Software website. They show that NAGWare's compiler is in
fact among the worst Fortran compilers available for x86s, with the Intel
compiler being nearly 50% faster on most tests (Lahey's compiler is also
signifcantly faster, so its not just some isolated Intel miraculous compiler
we're talking about here).

The compiler used for the test is gcc 3.3. This goes against the intent
of the Spec rules to pick the most suitable compiler for your system.
Veritest's claims are to pick this one compiler for both systems because it
would make both platforms even, but even this claim is false.

Veritest installed a single threaded speed tweaked malloc library for
the Apple system while not doing the same for the Dell system. This
makes a significant difference, as all the official x86 Spec results use
the MicroQuill SmartHeap tool to do this same thing for the official
results. (It only takes a minute or so to see this discrepancy in the
Veritest report, and similarly one can see the use of MicroQuill in the
x86 Spec CPU Int reports -- certainly you couldn't miss this if you've
spent a week scrutinizing it, as some Mac apologists have claimed.) If
the heap accelerator tool they were using couldn't be used with the x86
version of gcc, then they could have used MicroQuill's SmartHeap itself which does
work with Red Hat Linux for the PC.

On the G5 they have used the -freorder-block-and-partition
flag which is used for feedback directed optimization. For some reason
this flag is not used on the Dell system.

The OS used for the Dell in the Veritest is Red Hat Linux. However, they
used Mac OSX for the Apple G5! (There is a PowerPC port of Linux.) While this may
seem acceptable, or irrelevant, tests in the past have shown that Linux's
memory management induces additional overhead for very large applications
versus something more common, like Windows 2000, or Windows XP. We know that
in the last year Linus Torvalds himself has been involved in trying to fix
these problems himself, however, I don't know the conclusions of these fixes.
A look at the Polyhedron website given above shows that there is still
something in Linux
that is holding it back somehow.

If I can ever stop laughing ...

Possible censorship at AppleInsider.com

07/01/03 According to my server logs, there
was a link from the forums at appleinsider.com back to this page that
got posted there very recently. Not exactly a slashdotting, but noticable
nevertheless. The link:

http://forums.appleinsider.com/showthread.php?s=&threadid=26240

Appears to have been removed (very atypical of the other forum links to this
page, which happen from time to time.) So it was extremely short-lived if
anything. Just out of curiosity, I wonder if anyone knows what that thread
was about. Is it a case of the Apple fanatics censoring their own message
boards in an effort to keep their community from the truth?

Apple releases the G5

06/24/03 At one of the typical Mac shows,
Apple introduced the new PowerMac G5. Its based on the IBM 970 Power PC, its
64 bit enabled, and runs between 1.6Ghz and 2.0Ghz. Steve Jobs announced that
it was worlds fastest desktop computer and the first to be 64bit enabled. The
second claim is disputable (in the mid 90s DEC created something called the
"Alpha PC", which just used a slower version of their 64bit Alpha processor
and sold an ill-received PC based on it. Also AMD's Operton which was
released some time ago is currently only being sold as a server, but
obviously one could use it for a desktop, if one so desired.) But the first
is an absolute lie.

Now of course if you've been to this page before, you know its very typical
for me to call Apple/Jobs on when he overtly lies like this. And yes, I could
easily pick his statements apart, I could go to the raw data and show that
his claims are a complete crock, as usual.

But someone beat me to it, and did quite an excellent job. So let me just
provide the link here:

Now of course, I should point out that these new G5 Macs are clearly far
superior than the Motorola crap that they were using up until this point.
Far, far, far superior. In fact its puts in within shooting distance of x86
performance. But many of us who watch the microprocessor industry have been
anticipating this for more than a year. We have also anticipated the
performance level and knew very well that Athlons, Opertons and P4s would
still outperform the PPC 970 once launched. The only surprise was the 2.0Ghz
released clock rate (we all thought it would be 1.8Ghz) but its not enough
for the G5 to catch the x86 vendors.

So how can it be that this clean "RISC" based PowerPC manufactured by IBM
using the worlds best process technology is unable to beat the pants off of
the x86 vendors? (And how could I and others have known this would be the
case beforehand?) The G5 is a very wide issue, deep pipeline, deeply
dynamic/speculative OOE execution based CPU just like the x86s. But rather
bizarrely, they designed in an interested quirk: All instructions execute with
a minimum of 2 clock latency. Compare this to the Athlon whose minimum is
1-cycle (and the Athlon does an excellent job of getting most non-memory
oriented instructions down to this), and the P4 whose minimum is 0.5 cycles
(but few get below 1-cycle in real life.) The reason for IBM to do this is
to give the instruction scheduling mechanisms enough time to scan their
rather large 200+ entry instruction buffers. The problem, however, is that
these 2 cycle bubbles will show up in your general software's bottom line.
IBM's design trade off puts enormous pressure on the compiler to find
parallelism in code to cover these 2 cycle delays (it can still execute
instructions on every clock, but the CPU has find parallel instructions to
pull this off.) The Spec CPU benchmark is a notoriously over-analyzed code
base in which this parallelism is easy to find in many of the sub-tests. But
not *all* of the sub-tests.

This trade off that IBM has made would have made sense if they used it to
reach dramatically higher clock rates (since it does relax the instruction
scheduling stage, which you would assume was done for the purpose of
decreasing the *overall* clock time) but as we can see they only came out
with 2.0 Ghz. AMD is at 2.2Ghz in their aging Athlon architecture, while
Intel is just about to release a 3.2Ghz part (though their architecture is
admittedly geared almost exclusively towards maximizing clock rate.) Remember
that IBM has the world's state of the art manufacting process technology, so
they don't have any excuse for not being able to keep up with Intel and/or
AMD. Somewhere along the way IBM seems to have ended up making trade-offs in
their architecture cut them short of being equal to the x86 guys. Its
impossible to know exactly what they did without more information (the kind of
information that an outsider just can't get access to) but it likely has a lot
to do with its Power4 heritage. Just like Motorola was really only selling
their PPCs to Apple as an augmentation to selling them as embedded processors,
IBM's core motivations have been to sell their high margin Power4
workstations. It could just be a simple matter of being intrinsically a far
more conservatively designed CPU (which server and workstation people care
more about) thats being tweaked a just little bit in order to also sell them
to Apple as a high performance desktop chip.

MacNET faces reality

Mac vs. PC III: Mac Slaughtered Again

11/19/02 A typical headline from me right?
Well, its not mine. The site www.digitialvideoeditting.com
has been running benchmarks comparing the Mac and PC on, you guessed it,
digital video editting operations. This is where Mac users have been clinging
to a claim to some kind of advantage. In their test, apparently the PC beats
the stuffing out of the Mac. Well D'uh.

One for the archives

In particular, note the date on this story. Now
"the register" is not exactly the most reliable source of information, but
this appeared from various other "sources" as yet another carrot to dangle in
front of Mac fans, and as something to repeat to all the PC nay sayers around
them. Athlons are currently shipping at well over 2Ghz right now (they are
using a model numbering scheme and I'm too lazy to look up the exact number
right now) and the Motorola G5? Nowhere in sight.

And I'll bet dollars to donuts that the G5 will
never ship. Ever. IBM's PPC
970 (which will barely bring Apple back to a level that almost seems
competitive (it clocks at 1.6Ghz, but more importantly is its architectural
features which will allow it to have non-laughable Spec CPU performance
numbers)) has been announced, and will definately ship next year. Basically
this means unless Motorola is keeping everything totally quiet for some
reason, they will not have a competitve offering for Apple. And as IBM
cranks the clock rate, Motorola will find their inability keep up with
Moore's law will simply put them in no position to be making desktop CPUs at
all.

But of course, this is actually fatal for Apple.
The whole premise behind using the PowerPC architecture was that IBM and
Motorola would *compete* with each other to make CPUs for Apple. This was
supposed to ensure that Apple would always have a competitve CPU in their
system. Well we see what happened when IBM decided to basically ignore the
desktop PPC market for several years -- Motorola became lethargic, and
essentially had a default monopoly position for PPC shipments to Apple.
Intel really wasn't their competition, since Apple couldn't switch to them,
and Motorola just didn't give a rats ass about high performance CPUs. What
this means of course, is that they don't have the expertise (at least
anymore.) IBM suddenly shows up with a CPU which is roughly somewhere back
on the Moore's Law curve (they can do this because IBM has always been able
to attract talented computer designers, and have a serious commitment to
leading edge technological research), and we can see very visibly just what a
sham the Motorola CPUs have been for the past 3 years. So is the competition
between IBM and Motorola supposed to heat up again? Nope. I think Motorola
is going to bow out of this one. They are happy to sell PPC just in the
embedded space, and are unwilling to invest in the technology required to
make a serious desktop processor. But this just means that IBM will
eventually become lethargic and slow just as Motorola had, before it.

More condemning benchmarks

09/03/02
Gateway commissioned eTesting Labs to perform head to head benchmark testing
between their "Profile 4" PC versus a comparable iMac (flat panel, home
machines.) The results are as to be expected.

Parody of Apple switch ad

Apple fakes Xserve benchmarks

06/15/02
A discussion on comp.arch has indicated that Apple's xserve
benchmarking is suspicious. What a surprise!:

> Rupert Pigott wrote:
> > "JF Mezei" wrote in message
> > news:3D2DEF2A.77468C7C@videotron.ca...
> > > re: benchmarks
> > >
> > > http://www.apple.com/xserve/performance.html
> > >
> > > has some "refreshing" benchmarks because they compare real applications,
> > > not some SPEC benchmarks.
> >
> > However I don't really get much of a picture of what's going on from
> > their "BLAST" benchmark which appears to compare two different code
> > bases. Note the codebase they used for the non-Apple platforms was
> > optimised for searches on word length "11", and if you look at the
> > graph you will note that the benchmark supports that. Meanwhile the
> > Apple version seems to scale pretty linearly with word length. Clearly
> > these codebases are very different. SPEC at least tries to compare
> > apples with apples (haha). Something that also stands out like a
> > sore thumb in these results is that Apple compared DDR G4s vs
> > SDRAM P4s... HMMMMMMMMMMMMMMMM.
> >
> > Again the Xinet benchmark seems to run into a wall, but this time
> > it's the Apple stuff which hits the wall, and bizarrely the Dell
> > box seems to trail off slightly... RIPs are quite complex pieces
> > of software which also stress the I/O subsystem (in particular
> > file write performance). If you don't have enough RAM they can
> > kick the shit out of your VM too. I also note that the link to
> > the benchmark details is convieniently broken. On inspection of
> > the xinet site itself I note that the configuration info available
> > for the systems is inadequate.
> >
> > As for Webench, fuck knows, but SPEC have web benchmarks too, you
> > should go look at them. Maybe Apple didn't like the scores they
> > got with SPEC's web benchmark suite and so decided to find a more
> > favourable one. :)
> >
> > The Bonnie benchmark was conducted in a dubious manner too. Again
> > we have inadequate configuration information available so we are
> > unable to tell if they pulled a fast one. Linux supports lots of
> > filesystems out of the box, plus you have the sync or async
> > mounting options. I don't think they adequately explained their
> > choice of filesize either, 2Gb looks like one of those "special
> > numbers" to me.
> >
> > > This is not only CPU, but also NETWORK as well as file server stuff. To
> > > me, these tests are far more meaningful than those SPECmarks because
> > > they really compare total systems for various types of workloads.
> >
> > It's also complete shite because they don't compare the same
> > codebases and they don't provide the source so you can verify this.
> > If that wasn't bad enough they don't give you enough configuration
> > info either. Sorry, but I'll take SPEC's benchmarks over Apple's
> > any day of the week. What's more SPEC do more than CPU/Memory/Compiler
> > benchmarks too. :)
> >
> > Cheers,
> > Rupert
>
> Apple has provided AGBLAST source code [1]. I believe that these benchmark
> results do indeed compare two different codebases, which isn't a very
> meaningful comparison in my opinion.
>
> First, AGBLAST includes a patch which vectorizes lookup table generation.
> Lookup table generation does not consume much cpu time to begin with, so
> it's not surprising that commenting out the altivec code did not show me a
> measurable slowdown on the G4 I tested.
>
> Secondly, a stride optimization to an io loop is responsible for the
> scaling seen for large word sizes. This optimization is portable but does
> not appear to have been included when they ran it on other platforms.
>
> Thirdly, using blastn with a word size of 40 is neither a realistic nor
> reccomended use of the blastn algorithm; one should use megablast for such
> searches.
>
> I was unable to repeat the results from Apple's press releases [2] [3].
>
> [1] http://developer.apple.com/hardware/ve/acgresearch.html
> [2] http://www.apple.com/pr/library/2002/feb/07blast.html
> [3] http://www.genomeweb.com/articles/view-article.asp?Article=200213094535
>
> -George Coulouris
> firstname at lastname dot org

High end iMacs running Mac OS X noticably slower than eMachines

04/19/02
According to a
Wired story Mac OS X is biting at the ankles of Windows in terms of
performance. Desktop graphics rendering is considered a "solved problem" in
the PC world, to the point where people don't even bother benchmarking it any
more. However, in head to head tests against the new iMacs running Mac OS X,
the PC was noticably faster. Even the grandmaster of speedy web browsing
technology, Opera, is unable to make the sluggish OS perform well.

PC vs Mac benchmark collection

Spec 2K performance numbers for Apple

03/02/02German computer magazine. "c't" has
managed to run the Spec CPU 2000 benchmark on the latest Apple iMac. For
whatever reason, they ran the latest and greatest iMac 1Ghz (the fastest CPU
for iMac available at this time) against the ancient and crusty 1Ghz P-!!!
(currently machines are available from AMD running at 1.66Ghz, and Intel
running at 2.2Ghz both of which far outperform the old 1Ghz P-!!!) The P-!!!
wins, but that's not really important. The G4 results were obtained and are
306 SpecInt base, and 187 SpecFP base

Its a little dated (Oct. 2000) and I certain wish I had seen this earlier so
that I could refute it earlier. But what the heck I can't resist:

What is a supercomputer?
Many definitions of supercomputer have come and gone. Some favorites of the
past are (a) any computer costing more than ten million dollars, (b) any
computer whose performance is limited by input/output (I/O) rather than by the
CPU, (c) any computer that is "only one" generation behind what you
really need, and so on. I prefer the definition that a supercomputer performs
"above normal computing," in the sense that a superstar is -- whether
in sports or in the media -- somehow "above" the other stars.

Actually none of these are anything close to what I or most other people think
of as a super-computer. A supercomputer is quite simply a machine built
primarily for computational performance. PC's and Mac's highest concern is
always about making the box affordable for consumers -- thus being cheap to
build is the primary design criteria for desktop range PCs. Even workstations
are at the very least built with high volume production goals within bounded
price limits. Super computers are usually built to be optimized for
performance per volume of area, or performance per single CPU, or performance
per single operating system; not performance per dollar, or performance per
watt, or something like that. The G4 is not a supercomputer (neither is a
Pentium, or an Athlon for that matter.)

For example, Federal Express uses (or least use to use) a Cray supercomputer
to track all its parcel routing. The reason they picked a Cray was for the
tremendous volume of routing computation that was required, with a necessarily
less than 24 hour turn around time required on those computations. Sandia
National labs has used a 8000+ processor Pentium Pro super-cluster for nuclear
explosion and weather simulation. The reason they chose this was for the
sheer total GigaFlops that could be delivered by Intel by clustering up so
many of them.

It should also be pointed out that while both Intel and AMD based supercluster
machines are listed among the top500
super computers in the world right now, none use the Motorola Power PC (though
some use the IBM Power3; this is a specially modified core that is not
suitable for something like an iMac.)

... Because almost all modern personal computers perform in the
tens or hundreds of megaflops (more details later on such specifications),

At the time this was written the K6-500, Athlon 500, and Pentium III 500 were
all capable of nearly 2 gigaflops of sustained execution. However, these are
all single precision computations (using SIMD, like AltiVec uses.) The 64
(actually 80) bit FPU of the Athlon was only capable of a measely 1 gigaflop
(the P-III could do 0.5 gigaflops). The G4's 64 bit FPU (the G4 doesn't have
an 80 bit FPU) is comparable to the P-III's clock for clock.

What this means is that you can do things at a few gigaflops that you
cannot conveniently do at 100 megaflops. That is, you can do things in a
convenient time span or sitting, or most important, you can do in real time
certain tasks that were heretofore undoable in real time on PCs. In this
sense, the Power Mac G4 is a personal supercomputer: obviously personal (the
latest ergonomic and efficient designs for G4 products amplify this) and, due
to their ability to calculate in the gigaflops region, clearly above the pack
of competing PCs.

Laugh! The first PC to run at 1 gigaflop was the AMD K6-2 at 350Mhz, which
was available in 1998. PC's were able to decode DVD's in real time in
software (at about 400Mhz or so) long before it was possible to do so on a G4
(the software wasn't available until long after the G4 500 was out, and I
don't know how low of a G4 processor you can go and reasonably play DVDs.)

... and yet these modern PCs cannot solve certain problems sufficiently
rapidly, it is reasonable to define the supercomputing region, for today, as
the gigaflops region. ...

Of course, the state-of-the-art supercomputers of today reach into the
hundreds of gigaflops, yet surprisingly, they are typically built from a host
of gigaflop-level devices. A system like the Cray T3E ranges from about 7
gigaflops to the teraflop (1,000 gigaflops) region, depending on the number of
individual processors. One way to envision the power of a desktop Power Mac G4
is that the low-end Cray T3E at 7.2 gigaflops has the equivalent speed of two
G4 processors, each calculating at 3.6 gigaflops (an actual achieved benchmark
for certain calcu-lations). By the way, the advertised U.S. list price for the
bottom-end Cray T3E is $630,000
([sic] http://www.cray.com/products/systems/t3e/index.html;
October 11, 2000). On this basis, one can conclude that the Power Mac G4 is
“between” the supercomputers of the 1980s and the very highest-end
supercomputers of year 2000, and in fact is about as fast as the low-end
versions of the latter, most modern supercomputers.

Oh god. Straight from the mouth of Steve Jobs. Look, nobody would buy a Cray
T3D because of its ability to execute 7.2 gigaflops. Computers like that have
enormous memory bandwidth. I.e., they can execute 7.2 gigaflops *ALL THE
TIME*. PC's and Mac's can only reach into the gigaflop range of execution by
solving very specific problems that are not memory limited. In any event it
shows you the extent to which Apple is trying to twist the definition of a
super computer. The in core-Gigaflops of the CPU alone are not the only
measure of a super computer. Super computers are about real world
computations. For some industrial applications 7.2 gigaflops may be
sufficient, so long as it is always guaranteed, which commodity CPUs do not
and cannot offer.

V-Factor (Pentium III) = about 1 to 1.5

This result can be obtained from publicly available performance figures on
Intel's website (www.intel.com). For example, using Intel's signal-processing
library code, a length-1024 complex FFT on a 600-megahertz Pentium III runs at
about 850 megaflops (giving a V-Factor of about 1.4), while Intel’s 6-by-6
fast matrix multiply performs at about 800 megaflops ( V-Factor about 1.3).
(Note that one might be able to best these Pentium III figures; I cite here
just one of the publicly available code library results. These results do
exploit the Pentium III processor’s own MMX/SSE vector machinery, and I
caution the reader that these rules of thumb are approximations.)

As is typically the case with Apple, they chose numbers which are convenient
for their purposes. A straight-forward reading of the SSE or 3DNow!
specifications would show that this so called "V-Factor" is as high as 4 (it
can sustain up to 4 results per clock), and if you are below 2 while using one
of these SIMD instruction sets, they you are doing something wrong. So 2 to 4
would be a more representative range. If Intel is showing worse results for
some tests its because they are probably beating some other (more relevant)
competitor with a "good enough" solution that they couldn't or didn't
vectorize.

Cryptography and "big arithmetic"
Again on the subject of vectorized integer processing, there is the burgeoning world of
cryptography. It is well known these days that cryptography involves large numbers, like prime
numbers. It is not an exaggeration to say that most people have used prime numbers, at least
indirectly, in web activities such as mail and credit card transactions. For example, when you
order garden tools over the web, your credit card transaction may well involve a pair of prime
numbers somewhere in the ordering chain.

There is also a whole field of inquiry, called computational number theory, that is a wonderful
academic field, attracting mathematicians, engineers, and hobbyists of all ages. Whether the
interest is academic or commercial, it turns out that the G4 is quite good at this “big arithmetic.”
Here is an example: When timed against any of the public arithmetic packages for Pentium
(there are a dozen or so good packages), the G4 beats every one of them at sheer multiplica-tion.
In case you are familiar with timing for big arithmetic, a 500-megahertz G4 will multiply
two numbers each of 256 bits, for a 512-bit product, in well under a microsecond. And here is
a truly impressive ratio: In doing 1024-bit cryptography, as many crypto packages now do, the
G4 is eight times faster than the scalar G3. That huge speedup ratio is due to many advantages
of the vector machinery.

But there is more to this happy tale: The G4 can perform arbitrary vector shifts of long regis-ters,
and these too figure into cryptography and computational number theory. There are
some macro vector operations that run even 15 times faster than their scalar counterparts,
because of the vector shifts. In such an extreme case of vector speedup, it is not just the vector
architecture but also the unfortunate hoops through which the scalar programmer must some-times
jump —hoops that can happily evaporate in some G4 implementations. (Technically,
arbitrary shifting on scalar engines must typically use a logical-or-and-mask procedure on
singleton words, which is a painful process.)

This is quite simply an outright lie. It is exactly the opposite -- for big
number arithmetic the G4 is an unsually slow processor, in particular with the
big multiplications and shifts. Just take a look at the GMP benchmark for a
comparison (the performance scales with clock rate, and the code cannot be
improved using AltiVec -- thus the G3 and G4 performance is identical per
clock.) The P6 beats it, but the Athlon really hands it its head.

Apple threatens users with "Look and Feel" suits

06/16/01 People who are writing themes and
theme editing tools for Mac OS X are getting cease and desist letters from
Apple's lawyers. Ooh, nice one Apple -- piss of the *developers*. That'll
sure enamor the programming community to Mac OS X.
Thanks to "Mac haxor"
Peter Perlsø, for the links.

Reaction to OS X from the Apple faithful

06/04/01 It seems that OS X is being less
well recieved than Apply might have hoped. An article on macworld gives a
long lists of Mac-faithful complaints about the new OS. I've also listed a
somewhat even handed review of OS X by "NK Guy". Thanks to "Mac haxor"
Peter Perlsø, for the links.

Benchmarks, benchmarks, and more benchmarks

12/22/00 While looking around the net for
other benchmarks I stumbled across the NBench benchmark. It takes a subset of
the ByteMark tests (presumably avoiding some of the obviously "rigged for
certain compilers" sub tests, like the bitfield test) and runs the tests on
Linux/Unix workstations using the provided gcc compiler.

More Benchmarks

12/21/00 A german website has benchmarked
the infamous LateX technical formatting language on a number of processors
and found the G3 processors at about half the performance of comparably
clocked x86 processor. Given their Mhz disadvantage these days, that would
make them probably about one quarter the performance (comparing fastest G4's to
the fastest Athlon of Pentium !!!/4). (The Sparc results are pretty hilarious
as well.)

BeOS revisited

12/03/00 This is not really news, but I had
not seen it before. www.macspeedzone.com appears to have some inside
information about the NEXT vs Be vs Copland battle that ultimately lead to
Apple buying NEXT for some $400 million.

The title is almost certainly an exaggeration. It seems doubtful to me that
Apple would have truly gone with Windows NT (and doom their platform to be
blatantly obvious second class citizens by virtue of less support from MS and
publically losing every performance benchmark in sight.)

The reason I bring this up is just to comment just how ridiculously this was
handled by Amelio. Obviously Steve completely played Amelio for the fool that
he is. But there was more to it than that. Amelio let Apple's deseperate
situation lead them to make some truly bad errors in judgement:

NEXT cost $400 million. Be was asking for $200 million, Apple was only
willing to go as high as $120 million. Was NEXT really worth more than 3x of
what Be was worth? There was probably a reason that the NEXT OS was not
selling very well.

Did Be know that they were competing with NEXT
OS? Had Gasse' known the stakes, and what their competition was, perhaps he
could have convinced his backers to accept a lower price. Apple may have been
under time pressure, but for a decision as monumentous as this, that
artificial Jan. 7, 1997 deadline could have been pushed or ignored. What
Apple needed to do was to make sure they put Gasse' in his place, and force
him to accept a fair price and not blow them off when they demand an updated
demo. And of course they needed to do that with NEXT as well. Then they
could have made a more rational decision.

Being a portable, realtime, pervasively multi-threaded,
multiprocessor-capable OS is real technology. Using a non-standard language
made up just for your OS (Objective C) and concatenating buzzwords
(Web-Objects) is not. I wonder if Apple had the technical know how to
properly evaluate these two technologies.