AMD’s 65nm GPU: why the future of GPUs is smaller, faster, cooler

Current leading-edge GPUs are made on manufacturing processes that are well …

In spite of the fact that it's one of the most complex and cutting-edge integrated circuits in any PC or laptop, the GPU always lags the microprocessor by a few generations in terms of process technology. For instance the current king of GPU performance, NVIDIA's G8800, is made on TSMC's venerable 90nm process, a process node that Intel first introduced way back in early 2004 with the Prescott version of the Pentium 4.

This 90nm feature size is one reason why the G80 is so large and power-hungry, and it's also why the power-sensitive HPC cluster market is going to think twice before putting G80-based coprocessor boards in their systems. Of course, I suspect that if NVIDIA does launch a separate HPC-oriented line of G80-based boards (i.e., more memory and no discrete display chip), the GPUs used may well be fabbed on a smaller process. But how much smaller is impossible to say in advance of an announcement.

At any rate, a rumor from the Inquirer has it that AMD may start using some of this CPU manufacturing capabilities to fab the much-delayed R600 GPU. The original rumor goes back to last month, with a few more details being fleshed out in a new article today. I'm not going to put a lot of stock in any of the details here, especially given that the first rumor claims that the 80nm R600 will be "scrapped" in favor of a 65nm part, and then the second rumor insists, contrary to the plain sense of the first article, "we never wrote that 80nm GPU would be scrapped..." But the idea of a 65nm GPU that's cooler, faster, and cheaper does make quite a bit of sense, and I think it may happen.

In fact, I'll go a step further and suggest that AMD won't be the only GPU maker at the 65nm process node or smaller. I think it's likely that when Intel finally comes out with its upcoming standalone GPU, this GPU may be fabbed on a process node that's one step behind whatever the company's leading-edge process is at the time. And if both Intel and AMD are making GPUs on a process that's one notch behind the leading edge, then this may leave a design-only shop like NVIDIA in a bit of a bind, especially if all three players are eying the high-performance computing (HPC) market. Ultimately, the long-term question for NVIDIA may not be so much, can they compete with some theoretical and as-yet unannounced CPU + GPU fusion products from AMD and Intel? Instead, the question might be, can a design-only GPU maker compete in the commodity market with an integrated device manufacturer (IDM) at more advanced process nodes, after the point where the game has changed to make process a crucial differentiating factor?

Let me lay this very preliminary thesis out in more detail, and invite informed readers to poke holes in it.

IDMs vs. foundries, and the long-term outlook for fabless GPU makers

During the early decades of the semiconductor industry, it was common for each semiconductor maker to do both the design and the actual fabrication of integrated circuits (ICs). These integrated device manufacturers (IDMs) were the norm, and their high level of vertical integration gave them efficiencies and other advantages that design-only shops couldn't match. For a number of reasons (e.g., the development of a third-party EDA tools industry and increased economies of scale in manufacturing), these advantages eventually started to erode, and by the late 1990s the industry had begun to shift from an IDM-only model to a partially foundry-based model in which some companies (i.e., foundries) specialized solely in chip fabrication and others did chip design (i.e., fabless semiconductor companies).

This division of labor worked fairly well right up until the 0.13 micron node, at which point a number of new fabrication techniques were introduced that made it difficult for foundries and their fabless customers to sync up and get products out the door on the same timetable and with the same efficiencies as IDMs like Intel.

A recent article at the EET details the fresh challenges that foundries face at the 45nm node, a node that will see a number of fundamental changes to basic transistor designs and fabrication techniques. These changes, when combined with the incredible device complexity that higher transistor densities afford, may make the 45nm transition a difficult one for IP-only companies that rely on foundries for fabrication.

Now, NVIDIA is a fabless company that relies on a foundry for its device manufacturing. As it stands, however, any initial problems at the 45nm node don't concern them, because as I pointed out earlier they're still at 90nm. Indeed, the GPU in general isn't yet involved in the cutting-edge process race that the CPU market is. But this may change dramatically as the GPU begins to encroach further into the CPU's territory and, in some cases, to merge with the CPU itself.

At a hypothetical future point when Intel, AMD, and NVIDIA are locked in a battle to sell generalized data-parallel coprocessors to a fully established and growing commodity-based HPC market, that's when the process technology on which a GPU is fabbed will begin to matter a great deal. When this happens, Intel's "tick-tock" design/manufacturing strategy will combine with the company's strengths in the area of designing a microarchitecture to fit a specific process node, and this combination may give the company an edge.

To break this idea down in simpler terms, let me rephrase it as follows: if GPU makers want to encroach on the (increasingly power-sensitive) CPU market, then they're going to have to get closer to the CPU in terms of feature size. Unfortunately for a fabless GPU maker with its eye on competing directly with the CPU in a hot new market, keeping up with the CPU's torrid pace of process innovation may be easier said than done. IDMs like Intel, AMD, and IBM appear to have some fairly solid advantages when it comes to designing circuits that take full advantage of their own, in-house cutting-edge process technology, and it's not clear that a foundry + fabless play can match those advantages in a head-to-head fight on the ultra-competitive turf of the (currently niche, but soon-to-be-huge) commodity HPC market.

Of course, I may be wrong, and I reserve the right to change my mind about events that are at least a year or more out. So if I'm overlooking an angle then I'd love to hear about it. Drop into the discussion thread and give me your feedback.

Addendum: Intel's GPU revelations

Recent revelations at Beyond3D about the type of GPU architecture Intel is planning (summary: ten in-order Vec16 cores with private caches, 40 threads, lots of shared cache) left the most important question about Intel's new project unanswered: what process node will it be manufactured at? To put it another way, Intel showed one slide that placed block diagrams of their "hypothetical" GPU side-by-side with the G80, with the tongue-in-cheek caption "See any similarities?" If I had been in the audience for this lecture, my first question about this slide would've been, "you just compared a 90nm device to a....?"

Intel's number one strength has always been their process technology. They don't design chips that they can't fabricate, and even in cases where a chip's architecture leaves something to be desired (*cough*Pentium 4*cough*) the company's process engineering prowess is able to carry the chip along. This being the case, there's no doubt that Intel plans to leverage its substantial process expertise to make this new GPU work. This fact has to be a major concern to any competitor that's relying not only on a foundry, but on a process technology that's a few generations old. Intel's first standalone GPU design since the i740 debacle could have a few rough edges, but if it's fabbed at 45nm (or smaller!) then it's going to be cooler, cheaper, and probably faster than a better design on a significantly older process.

As for what ISA Intel's GPU will use, I've reported previously on rumors that Intel may extend the x86 ISA with a set of GPU-specific extensions. For what it's worth, I still think that an x86 extension is the most likely ISA choice, but with all of the talk of VLIW in the slides it intrigues me to think about the possibility of Intel using an IA-64 subset instead. IA-64 would be a better match for the GPU for any number of technical and business reasons, and it would give them a way to slip Itanium in through the back door. But I leave further musings about Intel's forthcoming common systems interconnect (CSI) and the possibility of an IA-64-based GPU as an exercise to the reader.