https://www.amd.com/system/files/documents/graphics-driver-quality.pdfAMD hired some firm to run some GPU stress testing to see how often video cards would crash. With AMD paying for it, you'd expect that they'd win the comparison. And that did happen, with AMD GPUs able to finish 92.8% of stress tests successfully, while Nvidia only finished 82.4%. But that's not the comparison that I'm interested in.
The GPU crashes that are more problematic is not that of stress testing. The bigger problem is that a compiler or something else mishandles something or a bug in a game or other non-driver software causes a GPU crash to fail in a non-graceful manner such as a blue screen or a system hang. Handling a wide variety of games without crashing is what you really care about, not just running a single artificial stress test.
Furthermore, the test that was run was chosen to favor AMD's hardware. AMD's GPUs can more directly measure power consumption and use that to throttle back clock speeds than Nvidia's. That makes AMD's GPUs more resilient in the face of an unfamiliar stress test. Now that Nvidia GPUs can at least throttle back with temperature short of overheating, that rarely matters.
What I'm more interested in is the breakdown by type of GPU:
Radeon RX 560: 71/72 pass
Radeon RX 580: 71/72 pass
Radeon RX Vega 64: 70/72 pass
Radeon Pro WX 3100: 59/72 pass
Radeon Pro WX 7100: 67/72 pass
Radeon Pro WX 9100: 63/72 pass
GeForce GTX 1050: 69/72 pass
GeForce GTX 1060: 62/72 pass
GeForce GTX 1080 Ti: 70/72 pass
Quadro P600: 41/72 pass
Quadro P4000: 60/72 pass
Quadro P5000: 54/72 pass
To me, the highlight of that is that there are 36 possible combinations of a consumer GPU and a professional GPU. In 34 of the 36 combinations, the consumer GPU crashed less than the professional GPU. If the comparison is between an AMD GPU and an Nvidia one, AMD only wins 27, loses 8, and ties 1. The difference between consumer and professional was larger than the difference between AMD and Nvidia that AMD wants you to focus on:
Total consumer GPUs: 413/432 pass (95.6%)
Total professional GPUs: 344/432 pass (79.6%)
All of the professional GPUs had substantial problems with the stress test. Among consumer GPUs, the GTX 1060 that was tested had some issues, but the rest were pretty stable.
Why are the professional GPUs so much less stable than the consumer GPUs? Is it because consumer GPUs have so much more competition with various board partners putting better coolers on and overengineering cards to allow overclocking?
On consumer GPUs, it would be pretty easy to skew the testing by the particular choices of SKUs chosen. Pick a consumer SKU with a premium cooler, solid power delivery, and little to no overclock for the GPUs that you want to perform well. Pick an extreme overclock or some cut-price aftermarket card with an inferior cooler for the competitor's cards that you want to perform poorly. I'm not sure to what extent that was done, but you largely can't do that with professional cards, as there is only one SKU for a given card. They've managed to demonstrate that there are plenty of consumer GPUs from both vendors that handled the stability test better than the professional cards from either vendor.
So what's the point of paying many thousands of dollars for the professional cards, again? Certified drivers for a handful of programs, yes. Driver optimizations for a handful of programs, yes. If you're into compute, sometimes the professional cards offer ECC memory or better double-precision compute, but even those are only available in the top end GPUs. But beyond that? Stability isn't an advantage of professional cards, as they're actually worse than consumer GPUs. I find that remarkable.
Comments
The nice thing now is that you can get dual purpose cards where you can load a consumer or professional driver.