Nvidia announces H100 accelerator to burn 700 W

Quizzical · March 2022

As with the A100, I hesitate to call it a GPU. Loosely, it's half GPU and half machine learning ASIC. The latter half is awesome if you're doing machine learning and is a waste of silicon if you're doing anything else at all. While not announced today, there are surely Hopper-based GeForce GPUs coming that will chop out most of the machine learning ASIC part and just give you the GPU part for a lot less money. You could describe the GeForce RTX 3000 parts as being basically that as compared to the A100. Even so, the process node and architecture are what they are, and what they could do with the big compute chip tells us a lot about what we can expect from GeForce parts.

The headline numbers that Nvidia wants to show off are that they more than tripled performance as compared to the previous generation A100 in several ways. As usual, cherry-picked numbers are very cherry-picked. With transistor count up by less than 50%, I'm pretty sure that what they did was to beef up the tensor cores to handle larger matrices and put the basic floating-point FMA in all of the shaders rather than only half, as the A100 (and Volta/Turing) weirdly did. Outside of that, it looks like they're increasing performance by a little more than 50%--if they can hit their target clock speeds, which they might not.

And that's a disaster for silicon scaling. The GH100 die size on TSMC N4 is only slightly smaller than the GA100 on TSMC 7 nm in spite of being a full node die shrink and then some. TSMC N4 isn't really a die shrink from TSMC 5 nm, but is really just a modified and more mature version of 5 nm. Even so, 5 nm is still supposed to be a full node die shrink from 7 nm.

Normally, for a prototypical full node die shrink, you'd expect to double your transistor count. Nvidia is only getting about 50% more transistors per mm^2. Meanwhile, you'd expect the power per transistor to go down by about 30%. Instead, Nvidia sees it actually going up by about 19%. They're claiming that the clock speed goes up, too, and a given chip at a higher clock speed will tend to burn more power. But on the expected power savings, it looks like they got basically nothing out of the die shrink.

One problem here is, just how much power do you want your video card to burn? Would you want a video card that uses 700 W under typical gaming loads? How about 500 W? Traditionally, the PCI Express specification capped power usage at 300 W, though the PCI-E 5.0 specification increases that a lot.

Traditionally, die shrinks are what allowed you to get more performance in the same power envelope as before. (Yes, yes, there were architectural optimizations, too, but the low-hanging fruit there has all long since been plucked.) If this die shrink doesn't give you any power savings, then will future ones do so? It's getting harder and harder to do that with every new process node, after all.

Furthermore, the problem isn't just one generation burning a lot of power. The problem is that that becomes the new base and the next generation will want even more. Even if you'd accept a video card that uses 700 W, what if the next generation needs 1000 W, and the one after that, 1500 W? Are we headed for a future where the mid-range, $300 card burns 500 W when you play games? And then a few years later, the $200 card wants that same 500 W? I hope not.

It's important to realize that while Nvidia's previous generation GA100 accelerator was on TSMC 7 nm, their consumer Ampere cards were on Samsung 8 nm, which is basically a more mature Samsung 10 nm. As such, they're basically getting a single node die shrink for their accelerator part, but they'll be getting two nodes worth of die shrinks on their consumer GPUs. At least assuming that the consumer GPUs are on TSMC 5/4, which they probably are, though that's not officially announced.

It's possible that Hopper is just a bad architecture, akin to Fermi, and that's what makes the power numbers look so bad. (Ampere wasn't a bad architecture in itself, but being stuck on an inferior process node made its power numbers look bad.) Apple seems happy with the scaling on TSMC 5 nm. It's also possible that the GH100 should be regarded as a severely overclocked part, which can badly hurt power efficiency. Intel's Rocket Lake was quite a power hog, but that didn't mean that their 14 nm++*^&% (or whatever they called the last revision of 14 nm) was a bad node. It was very dated by the time Rocket Lake came out, but Sky Lake proved that Intel 14 nm was a fine node in its day. We'll have to wait and see, but the early specs are not encouraging.

Ridelynn · March 2022

I don't think that on a node improvement you expect ~both~ transistor count to increase by 50% and power to drop by 30%. You usually get one or the other, i.e., power per transistor drops.

If you were to increase transistor count by 50%, having power only go up by about 19% would indicate that actual power per transistor did drop, by roughly 30%.

Also, I wouldn't go so far as to imply that just because this very specific, enterprise level AI accelerator uses 700W, that consumer RTX GPUs based on the same graphics architecture would burn anywhere near the same amount of power. This accelerator ~also~ has a metric buttload of HBM3 memory, 50% more GPU cores than a 3090, a price tag somewhere in the 5-digit range, and some other stuff that a graphics card probably isn't going to have.

I mean, you may as well look at the Grace system controller that they also announced - it's a 144 core ARM based controller that uses 500W... that doesn't mean that an ARM-based CPU for consumer systems would look pr act anything similar to that.

Quizzical · March 2022

Ridelynn said:

I don't think that on a node improvement you expect ~both~ transistor count to increase by 50% and power to drop by 30%. You usually get one or the other, i.e., power per transistor drops.

If you were to increase transistor count by 50%, having power only go up by about 19% would indicate that actual power per transistor did drop, by roughly 30%.

Also, I wouldn't go so far as to imply that just because this very specific, enterprise level AI accelerator uses 700W, that consumer RTX GPUs based on the same graphics architecture would burn anywhere near the same amount of power. This accelerator ~also~ has a metric buttload of HBM3 memory, 50% more GPU cores than a 3090, a price tag somewhere in the 5-digit range, and some other stuff that a graphics card probably isn't going to have.

Listed power per transistor goes up by about 19%. The transistor count goes up by a little under 50%, but TDP goes up by 75%. That's what I think looks so bad. Furthermore, I'm comparing the H100 to the A100, which is also an accelerator part, not a pure GPU.

Hopper-based GPUs (GeForce RTX 4000 series, perhaps?) need to be faster than the GeForce RTX 3000 series. That's just what people expect in order to justify an upgrade. If you can get 50% more performance and 50% better performance per watt, then power consumption stays about level and that's a solid improvement for a two-year generation.

If performance per watt doesn't change, then in order to get 50% more performance, power has to go way up, too. And that's a problem. Already, there are rumors that the top Hopper-based GeForce card will take 450 W or even 600 W. And that's really a lot of power for a consumer part. (Actually, that's a lot of power for a computer chip, period.) Yes, yes, a lot of Internet rumors are wrong, but a 700 W accelerator part doesn't exactly dispel rumors of runaway power consumption incoming.

Vrika · March 2022

NVidia also announced the PCIe version which uses 17% more power than the 80GB PCIe version of A100 used.

I think it's too early to judge either the fab process or the architecture as power hungry. It could also be just NVidia clocking H100 SXM really high.

Howdy, Stranger!

Nvidia announces H100 accelerator to burn 700 W

Comments

Howdy, Stranger!

Quick Links

Nvidia announces H100 accelerator to burn 700 W

Comments