Will the GPU wars turn into the foundry wars?

Quizzical · June 2017

For many years, Intel has had better fabs than AMD, which gave them a considerable advantage in the CPU competition. This was obvious when AMD had their own fabs, and remained obvious after AMD sold their fabs to Global Foundries. It was a huge advantage for Intel back when we were still seeing rapid advancements in CPUs, as having a process node available a year sooner meant Intel could bring a new tier of performance to market a year sooner.

For AMD's Athlon 64 to finally claim the performance crown in 2003 required a combination of a very good AMD architecture and a terrible Intel one in order to overcome Intel's considerable foundry advantage. Even then, Intel was tremendously profitable in that era, while having a good Intel architecture coincide with a bad AMD one in 2011 on top of Intel's foundry advantage was enough to drive AMD to the brink of bankruptcy.

But GPUs were different. For a number of years, AMD and Nvidia both had their GPUs fabricated at TSMC. A process node was available to both GPU vendors at the same time. However good or bad a node was, it was equally good or bad for both GPU vendors. TSMC doing their job well or poorly meant gamers could get new generations of performance sooner or later, but didn't directly advantage AMD or Nvidia relative to each other.

But now AMD has moved their GPU production to Global Foundries, which originally bought AMD's fabs. This is partially because as part of agreement for Global Foundries to buy AMD's fabs, AMD was required to buy many wafers from Global Foundries for a number of years. Abu Dhabi didn't just want empty foundries with no customers.

But AMD moving GPU production to Global Foundries while Nvidia stayed with TSMC has meant that if one foundry has a better process node than the other, that gives them a big advantage in the GPU wars. So far, that hasn't seemed to matter very much, as it's not at all clear whether Global Foundries 14 nm (actually licensed from Samsung) is better or worse than TSMC 16 nm.

It's not at all difficult to imagine a near future where that difference is enormous. If one goes purely by the public roadmaps and assumes that all foundries will deliver what they promise when they promise it, that could mean a considerable advantage for AMD as soon as next year. Whereas Samsung and TSMC focus on low power devices such as cell phone chips with their early processes on a given size, the LP in Global Foundries 7 nm LP stands for "leading performance", not "low power". Global Foundries is jumping straight to a high performance, high power process node, rather than that coming along a year after some node for cell phones.

Furthermore, Global Foundries is aggressively pushing for 7 nm. TSMC is focusing on 10 nm now, and only moving to 7 nm later. Samsung is going likewise. Global Foundries doesn't even have a 10 nm process node, but is betting heavily on 7 nm, in part using resources acquired in purchasing IBM's foundries.

I hope that "if all foundries deliver what they promise" clause above threw up red flags for you. Because it's a huge if. Global Foundries history on this isn't terribly promising. 32 nm was delayed, and had considerable yield problems at first. 28 nm was really delayed. 20 nm was irrelevant. 14 nm was canceled in favor of licensing Samsung's 14 nm process node. With that kind of history, this has the potential to blow up in AMD's face catastrophically.

Global Foundries is hardly the only fab to have problems. TSMC 40 nm was seriously troubled for a while. TSMC 32 nm was canceled outright. TSMC 20 nm was terrible. But TSMC's 28 nm and 16 nm process nodes delivered well, meaning that TSMC has some real victories in their history of delivering on promises, which Global Foundries is thus far lacking. Past performance does not necessarily predict future results, of course, but I'd argue that there's a lot more uncertainty in how Global Foundries will perform than TSMC or Samsung.

But if TSMC or Global Foundries has markedly better process nodes for a while, that could mean that AMD or Nvidia GPUs win a generation by default simply because the other vendor doesn't have access to a competitive process node. Nvidia could switch to Global Foundries if so inclined, of course. But trying to move your design to a different foundry than planned is such an enormous delay that it means you lose that generation because you don't bring anything to market until a later generation. Don't be shocked if either Nvidia or AMD wins a generation just by virtue of partnering with a foundry that won a generation as soon as next year.

Ridelynn · June 2017

AMD may be in a bit more at risk because of their contractual obligations with GF, but I'm certain both companies have evaluated the risk of something like this, and have some sort of contingency plan in place.

Quizzical · June 2017

A lot depends on how late the problems show up. Rumor has it that AMD's Cayman chip (Radeon HD 6970) was supposed to be on TSMC 32 nm, but the process node was abruptly canceled entirely weeks before the chip taped out. AMD then had to scramble to port it back to 40 nm.

But a process node being canceled is hardly the only thing that can go wrong. What if Nvidia and AMD are planning on launching new GPUs and six months out, it looks like they'll launch at about the same time. Then one of the foundries has its process node delayed, and delayed, and delayed again. Think of the serial delays that afflicted TSMC 40 nm, Global Foundries 32 nm, or Intel 10 nm, for example. Finally the new GPUs are able to launch a year after intended. If that only happens to one foundry and not both, the GPU vendor relying on that one foundry basically misses a generation.

It is possible to design the same chip for multiple foundries as insurance against one foundry having problems. Apple has done that at least once. But that's a massive development cost increase and neither Nvidia nor AMD has the volume to justify that expense.

Ridelynn · June 2017

Contingencies would include a lot more than just having an architecture jump to another foundry. There are a lot of challenges involved in bringing a new architecture to market, and the fab is just one cog in a big machine.

It's a good part of the reason we see some generations end up being mostly (or entirely) respins. We've had a lot of generation skips, on both sides of the aisle, and they haven't all be due to issues at the fab.

13lake · June 2017

AMD can't afford any but the barest of contingencies, just look at vega delay.

Ozmodan · June 2017

13lake said:

AMD can't afford any but the barest of contingencies, just look at vega delay.

What delay, they said last year 3rd quarter? Supposed to be out in July.

Ridelynn · June 2017

True - AMD only "officially" announced Vega back this last January. But we've all known it's been coming since ... god knows when. Probably back as far as 2015 as the successor to Fiji with HBM2.

It was inferred - not announced, admittedly - that when Polaris was not to include a "high end" model, because Vega was the High End model, that Vega would launch shortly after Polaris.

And here we are on our second iteration of Polaris now.

So if you go strictly based on what their PR says, sure, there's no delay. If you move that to the court of public opinion though, AMD is a year or more late with Vega already.

Quizzical · June 2017

It was claimed that Vega 10 taped out about a year ago. A year from tape-out to launch isn't a huge delay.

I think that AMD just looked at Polaris and saw that their architecture wasn't competitive. Big Polaris could have been a 300 W chip that struggled to keep pace with a GTX 1080. Rather than putting a ton of resources into launching a huge, expensive product that wasn't competitive, they decided to allocate their development resources toward better products that wouldn't arrive until later. Ryzen and Epyc certainly fit that description, and Vega likely will, too.

Ozmodan · June 2017

Funny was at Microcenter today looking at a basic gaming system for a kid and I noticed that Nvidia 1070 gpus were all in the high 400's many at 499, while the 1080 case prices were starting at 534. Why would anyone buy a 1070?

Vrika · June 2017

Ozmodan said:

Funny was at Microcenter today looking at a basic gaming system for a kid and I noticed that Nvidia 1070 gpus were all in the high 400's many at 499, while the 1080 case prices were starting at 534. Why would anyone buy a 1070?

Cryptocurrency miners have been buying a lot of them lateley, and that's inflated the prices.

MrMonolitas · July 2017

So i have question, why miners dont buy 1080? Why 1080 price is still the same?

Quizzical · July 2017

I haven't looked at the exact algorithm, but as I understand it, ethereum mining scales very well with global memory capacity. If a Radeon RX 580, GeForce GTX 1070, and GeForce GTX 1080 all have 8 GB of memory, they might well all give similar performance.

Ridelynn · July 2017

My understanding was that the miners were all looking for Return on Investment:

You spend $600 on a 1080Ti, but it only mines 1.5 timer as fast as a 1070 that costs $300, you would pay off the 1070 faster than the 1080Ti, and that's why we see these mid-tier cards getting gobbled up but they are avoiding the very low end and the very high end - they represent the best "bang for the buck," as we like to say.

Also, seems that most of these algorithms are memory sensitive - I see a lot of chatter about overclocking VRAM and underclocking the GPU itself in order to maximize hash rates while minimizing power draw (these guys do consider the price of electricity as an operating costs, if they are doing it right).

*edit* Or I should say, the ~smart~ miners are looking at return on investment. There are a whole lot of folks who are just looking online and see this or that is the best card, and they are going and blindly buying those cards at any price. The lemming crowd that's into mining is why we see insane prices right now - they aren't actually evaluating price/performance or return on investment.

Cleffy · July 2017

Actually, the reason the 1080s aren't being bought up is because of their memory. There is something with the GDDR5x they use which makes it not optimal for mining. I think the 1080 gets 21 mh/s while the 1070 gets 27 mh/s. The AMD cards are going over 30 mh/s

Howdy, Stranger!

Will the GPU wars turn into the foundry wars?

Comments

Howdy, Stranger!

Quick Links

Will the GPU wars turn into the foundry wars?

Comments