Nvidia deeply unhappy with TSMC, claims 20nm essentially worthless

This site may earn affiliate commissions from the links on this page. Terms of use.

One of the unspoken rules of customer-foundry relations is that you virtually never see the former speak poorly of the latter. Only when things have seriously hit the fan do partners like AMD or Nvidia admit to manufacturing problems, and typically only after postponed launches and poor availability have made protestations that everything is fine unsustainable.

That’s why we were surprised — and our source testified to being stunned — that Nvidia gave the following presentation at the International Trade Partner Conference (ITPC) forum last November. Many of the company’s complaints regarding its current partnership with TSMC are exactly what you’d expect given the manufacturing problems the entire industry is facing. What’s surprising are Nvidia’s remarks concerning TSMC’s current cost curves and manufacturing ramps. This is normally the sort of information discussed quietly between a foundry and its customers or by the press with help from various anonymous sources. Discussing the problems publicly is a sign of just how frustrated the company has become.

Watch the underlines for emphasis

TSMC builds hardware for a huge number of companies, but those customers have very different needs and use a wide range of process technologies. Historically, Nvidia (and ATI/AMD) have been regular early adopters. The nature of graphics is that it can easily soak up new processes and the higher transistor counts they enable.

Kepler broke the exponential rise in transistors per GPU

The flip side of that situation is that companies like AMD and Nvidia have also been responsible for assuming the risks associated with “risk production” and footing a hefty bill for the privilege. As those risks mount and costs skyrocket, Nvidia is increasingly unhappy with being asked to shoulder the burden. Nvidia’s slides talk about the need for “real” understanding, compromises on “rough justice,” and a closer relationship that looks more like that of an IDM (Integrated Device Manufacturer). For those of you who don’t know the term, Intel is an IDM — it handles both manufacturing and design. AMD used to be.

When AMD spun GlobalFoundries off, one of the things GF promised to provide that would distinguish it from TSMC was high levels of IDM-style integration. At TSMC, the customization work that is available is highly monetized; specialized work is expensive and time-consuming. In reality, GF’s ability to provide the amount of IDM-like flexibility that it wanted to offer has been sharply constrained by the problems associated with Llano and Bulldozer; our sources tell us that the foundry devoted enormous resources to bringing AMD’s 32nm APU back on track.

According to Nvidia, the current model is unsustainable. Here’s the company’s projected analysis for transistor costs at current and new nodes.

As the process nodes shrink, it takes longer and longer for the cost-per-transistor to fall below the previous generation. At 20nm, the gains all-but vanish. Want to know why Nvidia rearchitected Fermi with a new emphasis on efficiency and performance/watt? You’re looking at the reason. If per-transistor costs remain constant, the only way to improve your cost structure is to make better use of the transistors you’ve got.

As for wafer costs, they’ve become part of the problem.

What this slide states — we can’t even call it a suggestion — is that smaller processes no longer improve yields by leading to a greater number of chips per wafer. Instead, the complexities and difficulties of manufacturing at the new process create a cost structure that provides precious little incentive to manufacture at the new process.

If openly criticizing a foundry partner is unusual, showing data that suggests that your foundry partner can’t provide a cost-effective strategy for building hardware at next-generation process nodes is… a few steps past that point. The recent launch of the GTX 680, and that card’s trifecta of price/performance/power-efficiency actually strengthens the impact of this data. NV would’ve had a good idea how the GK104 was shaping up when it spoke at ITPC in November; this isn’t a case where a company is angry about the performance of a particular part and looking for someone to blame.

Again, follow the underlines.

The GK104 is great, but it doesn’t change the nature or severity of the underlying problems. As for whether Nvidia’s unhappiness with TSMC heralds a potential alliance with GlobalFoundries, we’re dubious. Not only has GF only recently ironed out its own 28nm issues, the nature of the foundry business doesn’t allow for quick shifts. Indeed, part of the reason that manufacturers like TSMC have historically exercised such control over their partners’ PR releases is because once you’ve committed to a foundry, you’re locked in for a substantial period of time. The fact that there’s now two foundries available with cutting-edge technology doesn’t change that, and the Common Platform Alliance favored by IBM, Samsung, and GloFo only mitigates some of the problems with moving a design from foundry to foundry, it doesn’t remove them.

The real question, at least for TSMC’s other customers, is whether the graphs and charts Nvidia has shown are specific to the company’s own products or reflect universal trends. There’s good reason to suspect the latter; Nvidia may have had more trouble than some of TSMC’s other customers, but our analysis of semiconductor industry roadmaps revealed a great deal of uncertainty about the road forward. Nvidia opted to aggressively optimize GK104 precisely because the old strategy of bolting on more cores and ratcheting up transistor counts isn’t sustainable.

Further evidence for the accuracy of NV’s presentation comes, ironically, from the company’s primary GPU competitor. At AMD’s Financial Analyst Day, CEO Rory Read made a point of saying that the company no longer intends to aggressively transition to new process nodes given the diminishing marginal returns from doing so.

Change the color scheme, and Nvidia’s graphs could’ve dropped right into AMD’s presentations in early February.

Nvidia’s willingness to stand up and talk about these problems is an “Emperor’s new clothes” sort of moment. The long-term repercussions, if any, are still unclear.

Intel does not seem to have this issue… I think the real problem is having a foundry trying to solve poor yields for one customer while still having to take care of a huge variety of different chips for other customers. NVidia does not like being “one of the unwashed masses” of customers. Don’t like the yields? Other customers getting in your way? Build your own fab. Oh wait, you decided to go fabless, didn’t you? Then learn to live with it or come up with a way of better testing the layout of your chips before releasing them to the fab.

Joel Hruska

It’s much more accurate to say that Intel is having fewer problems. If you read my earlier discussions of the semiconductor industry, you’ll note that Intel is very much a part of those debates and a member of the group in question.

Think of the semiconductor business as a train track. Intel is in a better position, but it’s not on a fundamentally different line.

John Pombrio

There has been discussion for years about how the cost of the fabs for smaller scales is increasing exponentially. What is it now? $5 Billion for a new 22 nm fab? Eventually most manufacturers will have to get off of this treadmill unless they have the deep pockets of Intel and IBM. NVidia is almost to this point and AMD simply cannot afford it until their profits climb, hence the breakup with Abu Daubi.

Joel Hruska

It’s much more accurate to say that Intel is having fewer problems. If you read my earlier discussions of the semiconductor industry, you’ll note that Intel is very much a part of those debates and a member of the group in question.

Think of the semiconductor business as a train track. Intel is in a better position, but it’s not on a fundamentally different line.

Joel Hruska

It’s much more accurate to say that Intel is having fewer problems. If you read my earlier discussions of the semiconductor industry, you’ll note that Intel is very much a part of those debates and a member of the group in question.

Think of the semiconductor business as a train track. Intel is in a better position, but it’s not on a fundamentally different line.

Joel Hruska

It’s much more accurate to say that Intel is having fewer problems. If you read my earlier discussions of the semiconductor industry, you’ll note that Intel is very much a part of those debates and a member of the group in question.

Think of the semiconductor business as a train track. Intel is in a better position, but it’s not on a fundamentally different line.

Joel Hruska

It’s much more accurate to say that Intel is having fewer problems. If you read my earlier discussions of the semiconductor industry, you’ll note that Intel is very much a part of those debates and a member of the group in question.

Think of the semiconductor business as a train track. Intel is in a better position, but it’s not on a fundamentally different line.

Joel Hruska

It’s much more accurate to say that Intel is having fewer problems. If you read my earlier discussions of the semiconductor industry, you’ll note that Intel is very much a part of those debates and a member of the group in question.

Think of the semiconductor business as a train track. Intel is in a better position, but it’s not on a fundamentally different line.

Intelligent Perspective

I believe your observation is off. Intel does not seem to have this problem Because they can handle the affected profit drop as they are worth so much more than AMD and Nvidia. Also there is always a monetary profit to scaling down…this is not at issue, the issue is the scaling up of expendature for the fabricator TSMC to make smaller dies. They are putting the whole cost of production of smaller dies on the designer. That is the issue.TSMC is not stepping up and making their production line streamlined for small dies, they just dump the production cost on the designer…the designers like Nvidia have already imemented around different methods to offset profit loss such as performance/watt and so forth. Only so much can be done at the design level. The argument is for the production side to implement cost saving techniques.

The increase of transistors per inch is only to intice the customers with better performance. Which is why customers are willing to pay the same price for a gpu every few years, while the quality improves. These charts show why for the last 5 years the price of GPUs have been going up dramatically to increase the profit margins and offset costs. Which is not sustainable. As customers will be less likely to afford the flagship GPU models.

These charts express that the fabricator is having a hard time manufacturing smaller transistors at an affordable price for the designers like Nvidia. Which is why the chart showing the increase in cost is relevant to the companies making a lower and lower profit. This is also why GPUs are becomming more expensive for customers.

The lower profit occurs because the price point is increasing beyond the grasp of an acceptable pricetag for so many people. (I know beating a dead horse)

The demand for higher quality GPUs is there reflecting current day high fps desired gameplay. But if every new GPU to release is $100 – $200 more than the previous release model, and income for many people stays the same the amount of people able to purchase the new model goes down and thus less profit. This is the industry fear. Which is why they are hoping that fabs like GF Global Foundries can partner up and find a way to make 14nm or smaller cheaper for Nvidia.

Marc DeRosa

They are free to try get a better deal elsewhere. And I’m sure they have looked into that. Staying with TSMC tells me that they are getting the best deal for themselves that the market will offer. Now maybe their closer partnership ideas are so compelling that other foundries will step up to fill those needs. If so, since these slides are from Nov, Nvidia should have a pretty good idea of who is interested by now.

Anonymous

Based on the presentation, it looks like the “better deal” is staying on each process node for a longer time to amortize the costs better. But if wafer defects are starting to impact yields too much, the cost may never come down enough.

In theory you can pack more chips onto a wafer because of the smaller transistor footprint, but if half of the extra chips crap out due to process defects, you don’t make enough to cover the higher cost of the fab. Or at least not enough to satisfy Wall Street.

Anonymous

Joel explained in this article why they haven’t gone elsewhere for their fabs. It’s just really hard for any company to change fabs, it’s a very long term process. Plus there really isn’t much choice out there. I don’t think Intel’s an option and then there’s Global foundry’s. And they have been also having manufacturing problems too. There really needs to be more competition out there. It would spur innovation and and probably help the public to get better prices. If these foundry company’s know that that there are some good options out there, they will be trying really hard to be innovative. right now I think they’ve got them by the balls and they “TSMC” know it.

Marc DeRosa

They are free to try get a better deal elsewhere. And I’m sure they have looked into that. Staying with TSMC tells me that they are getting the best deal for themselves that the market will offer. Now maybe their closer partnership ideas are so compelling that other foundries will step up to fill those needs. If so, since these slides are from Nov, Nvidia should have a pretty good idea of who is interested by now.

Anonymous

The real problem with Nvidia is just like any manufacture. If you only design a car and pay someone else to build it. You end up sharing more of the profits. Intel does very well because it has much closer ties to design, building and distributing. Also its market share helps keep its profits high. AMD bought ATI because it had hoped to be able to somehow combine the two companies into a combined CPU/GPU giant and also make its chip manufacturing more streamlined. I don’t think it has worked out so hot. Nvidia on the other hand has even worse troubles because it mostly is known for GPU design. Nvidia is trying to enter markets like mobile which in my opinion will not help margins because it will only force smaller designs.

The problem with scaling is not new. It is an extension of a long time line. Remember when Intel had to stop just raising clock rates back in 2002-4? PowerPC died on the desktop, and now even Intel if feeling pressure from ARM designed CPU’s. Design has extended our ability to hide the problem, but it can’t change the facts that one leg of Moores Law is already dead (frequency no longer scales with process nodes) and the second leg is falling off as we speak (actual cost per transistor). The last leg is decreasing power use for each new node. If that can no longer be accomplished then why will we need new fabs? When cost per transistor does not drop then either another entirely new process is about to start or we are seeing the end of an era. I believe we are going to see both. Neither Intel nor any of the other Fabs are going to give up their business without a fight. The real question is what will we do next?

Joel Hruska

Patrick,

See http://www.extremetech.com/computing/116561-the-death-of-cpu-scaling-from-one-core-to-many-and-why-were-still-stuck and http://www.extremetech.com/extreme/120353-the-future-of-cpu-scaling-exploring-options-on-the-cutting-edge

Your recollections are flawed. Intel stopped scaling CPU speeds in 2004, not 2002. PowerPC died on the desktop do to specific scaling problems with the architecture. That particular problem is unrelated to the issues at hand.

That said, you are correct about the greater long-term issues. I discuss the question of “What do we do next” in the two articles linked above.

Anonymous

One thing about Moore’s Law is that even though it describes a curve, it doesn’t specify how frequently you really have to pick points on the curve. Maybe we need to soak for longer period at each node. Maybe we need more radical technology changes like maybe moving away from silicon, or going to silicon-on-insulator, or novel transistor types, or 3d transistors, or memristors, etc. I DO believe that Moore’s Law can be sustained, but in the macro sense with occasional major jumps in approach to the problem.

HopelesslyFaithful

i am still wondering what will happen when we ditch silicon and use something else. I always hear it will be amazing but I really haven’t heard anything but rumors. :/

Anonymous

One thing about Moore’s Law is that even though it describes a curve, it doesn’t specify how frequently you really have to pick points on the curve. Maybe we need to soak for longer period at each node. Maybe we need more radical technology changes like maybe moving away from silicon, or going to silicon-on-insulator, or novel transistor types, or 3d transistors, or memristors, etc. I DO believe that Moore’s Law can be sustained, but in the macro sense with occasional major jumps in approach to the problem.

rickcain2320

Atoms aren’t getting any smaller, cosmic rays are just as prevalent, and die shrinks will eventually hit a technological dead end, unless a flying saucer crashes in New Mexico again.

Anonymous

The death of PowerPC on the desktop had nothing to do with technology issues. It had everything to do with the continuing cost to maintain parity with Intel. Apple simply wasn’t a big enough customer to make it worth IBM’s cost to keep up. Nobody else was in the market for such a product. POWER systems have continued to advance but IBM has focused on other markets, both higher and lower.

Anonymous

* nvidia complains that the technology shrink does not reduce price per transistor anymore. But it still helps reducing power consumption. And both GeForce and Tegra needs power optimisation.
So why is nVidia complaining?

* nvidia complains that as an early adopter of technology shrink, they pay the premium. But being an early adopter is critical in time-critical smartphone business, and it brings huge market shares.
So why is nvidia complaining?

If you look at nvidia history (and the ‘fermi’ crisis with 40nanometers), you will see that nvidia always fails to prepare for technology shrink.

There was rumors nvidia was asking Intel to produce its circuits. But I doubt it.
I think nvidia CEO Huang will rather try to get some money from TSMC, and will not hesitate to ruin TSMC’s brand if it helps.

Joel Hruska

Riboul,

In this case, you’ve failed to grasp the larger trends that impact *everyone.* Reverse your statements. Yes, GK104 is awesome. Really. Great card.

“So why is Nvidia complaining?”

Because this is bigger than GK104. It’s bigger than Nvidia. Yes, there’s some question as to how accurately NV’s slides map to the market as a whole. The trend, however, is universal.

Jake VanWagoner

About power consumption . . .

There are two aspects: dynamic power consumption (power per switch) and static power consumption (transistor leakage in the “off” state). As you scale the transistor size down, the dynamic power consumption drops. But at the same time, the static power consumption rises. Without dramatic changes to how the silicon is processed, those two power consumption factors have been about equal for a while, and it has only improved because of dramatic changes to the process — strained silicon, hi-k/metal gate, etc.

Karen Savala

On behalf of SEMI and for the record, the author of this article used conference materials without permission from us or the speaker, nor did the author attend at the conference. ITPC is a global conference designed to bring together all aspects of the semiconductor manufacturing ecosystem for the purpose of collaboration and cooperation.The conclusions drawn in this article in no way reflect the spirit in which this presentation was delivered at the conference in November, 2011. – Karen Savala, President SEMI Americas

Joel Hruska

Karen,

You are certainly correct, I did not attend the conference. The presentation I used for this article, however, is not marked “Private”, “Do Not Distribute”, or “For Use at ITPC 2011 Only.” If such marks existed at any time, they were removed prior to my receiving the document.

Given this, I saw no need to seek permission to publish these slides. Neither Nvidia or SEMI has requested I remove them, publicly or privately.

You write “The conclusions drawn in this article in no way reflect the spirit in which this presentation was delivered at the conference in November, 2011.”

I take issue with that for two reasons. First, I present this article as the latest link in a chain of coverage I’ve written concerning the future of semiconductor manufacturing. The *reasons* why Nvidia wants greater collaboration and risk-sharing with TSMC, and the rising costs associated with lower process nodes are not a secret. They are not unique to any single company, foundry, or vendor. They are ubiquitous and I make that very clear.

You give a presentation like this because you believe things have to change for a situation to remain tenable. You give it because you want to show how a current model doesn’t work. You give it because you believe the people watching will find value and perhaps recognize their own situation in your descriptions.

I didn’t write this story as a witch-hunt, a nose-thumbing bit at Nvidia, or a condemnation of TSMC. If the presentation was given in a spirit other than the one my article implies, I can only assume it was meant far more vindictively, possibly with the goal of inciting trouble between multiple vendors and TSMC.

Anonymous

Damn Joel, way to tell her what’s up! I agree with your analysis completely but she sounded pretty pissed. Acting like they might want to sue you. It looks to me like what you posted would fall under the “fair use doctrine” or something like that.

andhavarapu

I think Samsung should expand their foundry business to nvidia. Since Intel won’t. They’re probably the only other guys with that kind of money. It’ll be nice to see an Intel competitor without teething troubles.

pfperry

Instead of looking at the nanometre end, let’s look at the other side – the systems these chips are going into. I can’t help feeling that there has been a tendency to rely on shrinking geometry to solve problems, rather than trying to work smarter with what we have now.
Just because it is (barely) possible to get smaller, doesn’t mean that 22 or 15 or 7 is the optimum for the chip user or the end customer.

Me

POET technologies!!!! haaaa this is awesome! who will be the first big company to look at POET and say holly shift! we need that now! POET will create a new para-dime in the industry with new 10x speeds 80% energy use starting at 100nm read up on it folks.

jasiohopski

Dear Me, The financials of GaAs (or alike) POET process is way worse than CMOS. The Poet Tech is not doing much of a breakthrough. Vitesse tried that business long time ago and failed. Wafer are very brittle, very small and hard to slice. Yes VCSEL lasers (lots of them fits small wafer) are economic thing but 10 billion transistor on GaAs die is a pipe dream. Intel never touched this technology for reasons. Samsung as well. Who delivers EDA for such a chips ?.
Try to contradict me so I may change my reasoning.

Tom Allsup

Geez…freaking shrink the GF110 to 28nm already and relaunch it at higher frequencies with lower TDP. Kepler’s architecture was a huge step backwards in terms of professional iRay work anyway.

Here we are three years later and still on 28nm until 2016 so Nvida’s complaints have been proven TRUE.

fuqthegovt

It would appear Moore’s law is on it’s deathbed given Intel’s latest tick tock tock cycle (14nm is hard to keep profitable given the poor yield). The shrink to 10nm is going to cost $10-12B just in new tooling that isn’t even invented yet. 7nm probably isn’t going to happen with silicon since the cost with every shrink doubles with the transistor count per wafer so no incentive to do it as far as investors are concerned. The next likely contender is probably GaAs, but that costs 1000x more than silicon, why it’s not used to make solar panels even though they are 2-3x more efficient. Thing is no one knows how to design and lithograph with that given how poorly understood it’s properties are since we’ve been Si-only since the beginning in the 1970’s and R&D to learn is expensive. Graphene is so new that probably won’t be used as transistor material for at least 20 years.

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.