Judging by details revealed in a chip conference agenda, the clock frequency race isn't over yet. IBM's Power6 processor will be able to exceed 5 gigahertz in a high-performance mode, and the second-generation Cell Broadband Engine processor from IBM, Sony and Toshiba will run at 6GHz, according to the program for the International Solid State Circuits Conference that begins February 11 in San Francisco.

Low end Microcontrollers are liable to be 8 or even 4 bit CPUs which pre-date the RISC / CISC debate and as such, don't really count as either.

By your logic, all the CISC architectures that precipitated the RISC design don't count as either because they pre-date the RISC / CISC debate! Classic 8-bit microcontrollers like the Zilog Z80 and the Motorola 6800 are most definitely CISC chips. The only one I can think of that doesn't really count as either is the PIC, and then only because it's in some ways RISC-y (single-cycle fixed-length instructions), and in some ways CISC-y (accumulator-based register model with memory operands).

If you move up into 32 bit embedded controllers CISC ISA based processors are pretty much nowhere to be seen, it's dominated by ARM which is a very RISC ISA.

Those aren't really "micro" controllers as such --- I lumped them into my embedded category.

That is probably true for Core2 probably not in the case of POWER6 and definitely not in the case of Cell.

I wasn't implying that you could do an x86 Cell properly. You really wouldn't, because you really don't want to do an in-order x86. Of course, for the PPE's role in the chip, I don't think an in-order anything is a particularly good idea.

POWER6 is rumoured to be quite a bit simpler than POWER5 for out-of-order execution, this is more likely to hurt x86 performance than PowerPC.

It is doubtful that POWER6 has a simpler OOO core than, say, the Pentium Pro. IBM is espousing 2x the performance for POWER5, and at 4-5 GHz, Power6 will have to retain comparable IPC to POWER5 to meet that goal. That level of OOO is likely enough to make up for any deficiencies of x86. Sure, you'll have to make the pipeline a couple of stages longer to decode x86 efficiently, but that's not going to change your performance drastically.

In the case of Cell it's power comes from the SPEs, these are very much RISC designs and are highly dependant on their ISA, making them decode and execute something like x86 code would just plain hurt.

Yep. Entertainingly, PowerPC is apparently not RISC enough for the SPEs (which is another reason why this RISC versus CISC thing is so silly to talk about).

They had to build a whole new processor when they made the 970, with POWER6 it's designed to be scalable so building a cut down cooler version is pretty much a case of putting the same chip in a smaller box.

Every engineering design is a point in the design space. That point is decided via numerous trade-offs which are made to achieve a particular final result based on particular given specifications. You can't move a design to a radically different point and still expect it to perform as well as another design that's targetted for that specific point.

POWER6 has a specific design point: 100W+ TDP, 32MB+ external L3, 75GB/sec memory bus. It is designed to that specification. The circuits are designed for high-clockspeed, not low power consumption. The large L3 cache puts a lower burden on the OOO core to cover memory latency, allowing it to be simpler. The huge memory bandwidth influences the design of the prefetch algorithms. Core 2 is designed to a different point: 35W TDP, no external cache, 10GB/sec memory bus. It's circuits are designed for low-power consumption over ultimate clockspeed, it has a deeper OOO core to cover memory latency, and it has to be more judicious about its prefetching.

POWER6 is simply not going to scale down to Core 2's design point while performing competitively with Core 2. That's just not how things work. What Apple, rightly realized was the fact that the design point they needed was precisely the one Intel was targetting with their processors. They could get a chip that was actually designed for the tasks they needed, instead of having to use a chip that was drastically scaled up or down to fit their market, with sub-optimal results.

1. The Power6 does seem have been designed to allow a lot of configurability.

"From the start, IBM has designed the POWER6 systems to be extremely configurable. The intra-node busses, which normally operate on 8 bytes/cycle can be chopped down to 2 bytes/cycle for low-end systems, and the inter-node busses can also operate at 4 bytes/cycle. Similarly, the two integrated memory controllers can both operate at half-width, and one of them can be removed entirely. The external L3 caches are optional, and are available either in the MCM, or in an external configuration. ..."

Now I realise that does not mean it would scale to a consumer level notebook but it is interesting.

2. It appears that Power6 will be better at out of order execution thanks to a change to configuation of pipline stages,

"The basic pipeline for the POWER6 is the same number of stages as the POWER5, but they have been rebalanced across the different phases. Most significantly, dependent ALU operations now can execute back to back, eliminating a vexing kludge in the original POWER4/5 architecture. This makes the out-of-order scheduling easier, and is probably the reason that the instruction issue/dispatch phase uses 2 cycles in the POWER6 (compared to 4 in the POWER5)."

It reveals some details that I haven't seen published before, specifically the fact that the core isn't really any narrower than Power5+: Power6 has 2 integer units, 2 FPUs, one branch unit, presumably 2 load-store units because of the dual-ported data cache, and is 7-issue over two threads and 5-issue on one thread.

In response to your point, you're right that Power6 seems very scalable for IBM's server line. It looks like its going to go from blade systems all the way up to very huge servers. However, the thing to keep in mind is that even the cut-down configuration of Power6 puts it in the high-end Opteron/Xeon range from a system architecture point of view. A half-width memory bus on one controller is still in quad-channel FB-DIMM territory, and even a quarter-width elastic I/O bus is still in Hypertransport territory. And of course the core is still huge, with 4MB of L2 per core, and on-chip L3 directories, etc. Such a system is pushing it even for a hypothetical $5000-range PowerMac, much less a $1500 iMac