Introduction

"What you have seen is a public demonstration of 4 GHz silicon straight off our manufacturing line. We have positive indications to be able to take Netburst to the 10 GHz space."

"While architectural enhancements are important, Intel intends to continue its lead in raw speed. Otellini demonstrated a new high-frequency mark for processors, running a Pentium 4 processor at 4.7 GHz."

The first assertion was made at IDF Spring 2002, and the second press release was broadcasted after Fall IDF 2002. Fast forward to the beginning of 2004, and we read in the Prescott presentation: "2005-2010: the era of thread level parallelism and multi-core CPU technology. " What happened to "the 10 GHz space"?

The presentation of the new 6xx Prescott even states that Intel is now committed to " Adding value beyond GHz". This sounds like Intel is not interested in clock speeds anymore, let alone 10 GHz CPUs.

Already, the hype is spreading: Dual core CPUs offer a much smoother computing experience; processing power will increase quickly from about 5 Gigaflops to 50 gigaflops and so on. It is almost like higher clock speeds and extracting more ILP (Instruction Level parallelism), which has been researched for decades now, are not important anymore.

At the same time, we are hearing that "Netburst is dead, Tejas is cancelled and AMD's next-generation K9 project is pushed back." Designs built for high clock speeds and IPC (Instructions per Clock) are no longer highly regarded as heroes, but black sheep. They are held responsible for all the sins of the CPU world: exploding power dissipation, diminishing performance increases and exorbitant investments in state of the art fabs to produce these high clock speed chips. A Prescott or Athlon 64 CPU in your system is out of fashion. If you want to be trendy, get a quad core P-m, also known as Whitefield [2], made in India.

To the point

I am exaggerating, of course. A good friend of mine, Chris Rijk, said: "PR departments having no 'middle gears': they either hype something to great lengths, or not at all." Trying to understand what is really going on is the purpose of this article. We are going to take a critical look at what the future CPU architectures have to offer. Is the traditional approach of increasing IPC and clock speed to get better performance doomed? Does multi-core technology overcome the hurdles that were too high for the single-core CPUs? Are multi-core CPUs the best solution for all markets? Will multi-core CPUs make a difference in the desktop and workstation market?

In this first instalment, we explore the problems that the current CPU architectures face. The intention is to evaluate whether the solution proposed by Intel and other manufactures is a long-term solution, one that really solves those problems. We will also investigate one CPU in particular, the Intel Prescott. So, basically there are 4 chapters in this article that will discuss:

The problems that CPU architects face today: Wire Delay, Power and the Memory wall. Chapter 1 - The brakes on CPU power

The reason why Intel and others propose dual core as a solution to these problems. Chapter 2 - Why single core CPUs are no longer "cool"

Whether or not these problems can be solved without dual core. Chapter 3 - Containing the epidemic problems

Although Intel is undeniably the industry leader in the CPU market, this doesn't always mean that the solutions proposed are the right ones. For example, remember MMX, which was a technology that should have turned the (x86-based) PC into a multimedia monster. In hindsight, the critics were right. MMX was little more than a marketing stunt to make people upgrade.

The first implementation of hyperthreading on Intel's Foster Xeon (Willamette Xeon) was turned off by default by all OEMs. And hyperpipelined CPUs with 30+ stages turned out to be an impressive, but pretty bad idea.

In other words, not all hypes have turned out to be beneficial for the customer. Millions of customers are still waiting for the rich content on the Internet that is enabled by and runs so much faster on the Netburst architecture...

Post Your Comment

65 Comments

Ain't no way you can get those repeaters out of there - that's already the optimum solution for driving the large load (interconnect). It probably equalizes the stage effort required (you can work out the math and find that for multi-stage logic, the optimal config is that each stage has the exact same effort level). Eg. instead of driving an interconnect with a "unit" inverter, it might be more feasible to drive it with a chain of them, each with different fan in/out. Repeater insertion is tricky and (as far as I know) can't readily be automated.

Interconnects are getting to tbe point where traversal of a die diagonally can take multiple clock cycles. Some folks are suggesting that a pipelined approach could be extended to interconnects, esp. clock trees. But the most fun problem (for me at least :P) is the handling of inductance extraction - how in the h*ll do you model it accurately? High-speed digital design == Analog design. Long live analog / mixed-signal VLSI designers :PReply

[quote]Well-written multicore-aware code should have the number of cores as a _variable_, so you just set it to 1 on a uniprocessor platform.[/quote]

Sometimes parallel algorithms aren't very good for serial execution. In these cases, you may actually have one algorithm for multiple processors and another algorithm for a single processor.

[quote]So, if Intel were to use less repeaters the heat output could be lowered significantly. [/quote]

Well... I'm sure the Intel engineers didn't just up-and-say one day, "Hey, I know something cool to do... let's put some more repeaters into the core." I'm sure there's a reason for them being in there. It would probably take a bit of redesign to get the repeaters out. (I'm pretty sure this is what you meant, but I just wanted to clarify that stuff like repeaters aren't just put into a CPU for no reason. Things like repeaters are put in because there wasn't a more viable solution to some signalling problem that's there.)Reply

So, the reason for the Prescott's shortcomings is the use of too many repeaters as shown in the image of the Itanium 2. If I remember correctly, the article said that the repeaters were using too much power as well. So, if Intel were to use less repeaters the heat output could be lowered significantly.Reply

Nice article and pretty easy to understand as well. I'm happy to hear that there may still be hope for controlling the power leakage, because without it I just can't see anybody getting beyond 65 nm, since even 65 nm will, without improvements, leak almost 3 times as much power as 90nm does now.

Anxiously waiting for E0 A64 to see what AMD has managed to cook up.Reply

There are plenty of multi-threaded apps out there. I am not sure pure single threaded apps exist any more outside of "Hello World" and some old Cobol/FORTRAN ports that are on floppy.

Quake and UT have been multi-threaded for a while. Quake was multi-threaded when I had a dual Pentium pro. There were even benchmarks. The benefits seen with hyper-threading also show that many apps are multi-threaded. The performance gain was negligible due to the graphics drivers and OpenGL/DirectX not being thread optimized. I am sure that has been worked out by now.

Multi-threading is not all about making use of multiple CPUs. There are many conditions where a program would be stopped dead in its tracks waiting for a response from some outside program or hardware device. You can solve this with events, multi-process, multi-threading, call-backs, etc. Goal wise, they are related. In the Winders world, threading is the method of choice.

I really can't believe there are still arguments going on about programs not being multi-threaded. This is not that much of an issue any more. Even if your apps is not threaded, the OS is and it can run on one CPU while your app runs on the other. Or if you have 2 apps, then they can run on different CPUs.

With all that said, I agree with the thought that creating performance for all applications is better served using a faster single core CPU than dual CPUs. I think this way because when you have a unit of work to be done (even with multiple threads), it is more likely to be done quicker with a single CPU that is capable of the same computing power as 2 CPUs. I single unit of work will ultimately be smaller than a thread in all cases. The smallest is the instruction set.

Now...with that said, if the limiting factor is technology and they cannot obtain the equivalent performance of a dual core with a single core, then it makes since to go dual core to obtain it, especially with the power leakage. I like the thinking behind dual core on a laptop, but am skeptical about the part that says turning the CPU off and on rapidly to keep it cool and efficient. It will probably work if it isn't turned on and off too quickly, but heat spreads pretty quickly. You wouldn't even get past POST without a heat-sink and that silicon insulator keeps everything pretty cozy.Reply

It's pretty much impossible to get a "newbie" explanation of CPU architectures without a least a basic understanding of how CPUs work. Rand's suggestions were quite good, you should start there if you're overwhelmed by Johan's explanations IceWindius. It also wouldn't hurt to start with Anand's CPU articles from last year. Reply

"I wish someone like Arstechinca would make something really built ground up like CPU's for morons so I could start understanding this stuff better."

You may want to read parts 1-5 of "The Secrets of High Performance CPUs"
http://www.aceshardware.com/list.jsp?id=4A bit outdayed now, as it was written in 99' if I recall correctly but it's still broadly relevant and a nice series of articles if your looking to get a better understanding of microprocessors without being drowned in the technical side of things.

ArsTechnica also has some good articles with a newbie friendly slant.

There are some excellent articles at RealWorldTech as well, but their definitely written for engineers rather then the average person.
Unfortunately most of the more noteable books like those by Hennessy & Patterson assume you've already some knowledge of computer architectures.Reply

#46, Well-written multicore-aware code should have the number of cores as a _variable_, so you just set it to 1 on a uniprocessor platform. I also think there already exists a multithreaded version of one of the big engines (Quake, UT?) that apparently does not lose any performance on a single core either.

But I agree with the main thrust of your post, which is "Buy AMD".Reply

Not to belittle dual core development and I know there are a lot of people who run technical programs that will benefit from dual core on this site, but when I spend a small fortune on a pc, the primary driver is being able to play the most advanced games in the world. Unfortunately, I don't feel multi-threaded game code is going to get written for a longggggg time (what's the point of reducing potential customers?). How long till a very large percentage of users have dual cores? End of 2006 at the very earliest? So it's really a just a theoretical interest till then for me...Reply