Everyone who follows GPUs knows that AMD had a far more efficient GPU architecture than Nvidia during Nvidia's Fermi generation. That is, comparing the Radeon HD 5000/6000 series to the GeForce 400/500 series. Then with Kepler, Nvidia got a slight lead, comparing the Radeon HD 7000/Radeon R 200 series to the GeForce 600/700 series. With Maxwell, Nvidia has really pulled away, with the GeForce 900 series massively more efficient than the Radeon R 300 series. That's true by whatever metrics you prefer, whether performance per mm^2, performance per watt, performance per $ to build the cards, or whatever. Right?
Wrong. That's because you're only considering graphics. Up through Nvidia's Tesla generation, their GPUs were really only built for graphics, and if you wanted to try to use them for anything else, good luck. With Fermi, Nvidia tried to make an architecture more suited to general-purpose compute. Compared to previous Nvidia GPUs or even the contemporary AMD GPUs, they succeeded. AMD, meanwhile, would keep their GPUs focused purely on graphics for another two years. While AMD GPUs could be used for non-graphical things, their VLIW architectures were very restrictive in what they could handle well, so non-graphical performance was at minimum a pain to code for, and performance would often be dismal.
With GCN, the situation flipped. Now AMD and Nvidia were both trying to make their GPUs work both for graphics and non-graphical compute. But Nvidia was focused more heavily on graphics, while AMD put more non-graphical stuff in. The extra stuff AMD put in came at a cost, and made Kepler slightly more efficient than GCN at graphics. With Maxwell, Nvidia made an architecture focused purely on graphics, and the non-graphical stuff was out entirely.
So what is this non-graphical stuff? The most publicized things are double-precision (64-bit floating point) arithmetic and ECC memory, but they're hardly the only things. FirePro versions of AMD's Hawaii chip of AMD's Hawaii chip (Radeon R9 290/290X/390/390X) absolutely slaughter every other chip ever made in double-precision arithmetic, whether GeForce, Quadro, Tesla, Xeon, Opteron, Xeon Phi, POWER, ARM, Cell, FPGAs, or anything else you can think of. It beats a GeForce GTX Titan X by about a factor of 14. Seriously, fourteen. Getting a Quadro version doesn't help the Titan X, either, and there is no Tesla version. It beats AMD's own Fiji chip by about a factor of five. The nearest competitor is Nvidia's best Tesla chip, which the FirePro beats by about 83%.
It takes a lot of dedicated silicon to offer that sort of world-beating double-precision arithmetic performance. And the silicon to do that is completely disabled in Radeon cards. Not that it would be used at all for graphics even if it weren't disabled. Think that has an impact on making the chip less efficient for graphics? Fiji doesn't have it, which is part of what allows Fiji to be so much more efficient than Hawaii.
Now, double-precision arithmetic tends only to be present in the top end chip of a GPU generation. AMD and Nvidia have figured out that it's not that expensive to design two versions of your shaders for a generation rather than one: one with the double-precision units and one without. But some things that are in primarily for non-graphical compute use aren't so easy to cut out, but rather, filter all the way down the line.
For example, let's consider register space. In launching the Tesla K80, Nvidia used the GK210 chip, which is basically a GK200 with double the register space per SMX, but fewer SMXes to compensate. With 6.5 MB of registers, GK210 has more register space than any other chip Nvidia has ever made. That's considerably less register space than AMD's Tonga, let alone the higher end chips, Hawaii and Fiji. It's a similar story with local memory capacity and bandwidth, where AMD put in massively more of it than Nvidia, and far more than was plausibly useful for games.
Not that long ago, Phoronix reviewed the Radeon R9 Fury X on Linux, and noted that it was substantially slower than the GeForce GTX Titan X at games, but also substantially faster at compute. Their conclusion was that compute works well, but AMD needs to put more work into drivers to get gaming performance up to the standards for how well compute works. Their conclusion was mistaken, however. They didn't realize it, but the difference they were measuring was in silicon, not drivers. While Fiji isn't intended as a compute chip, it has the stuff that all GCN chips have that were put in for non-graphical reasons.
It's also important to understand that this is something can change instantly going from one generation to the next. If, in the next generation, one GPU vendor decides to focus purely on graphics, while the other puts a ton of stuff in for non-graphical compute, the former GPU vendor will predictably be better at graphics and the latter at compute. Either GPU vendor could independently make either decision (or somewhere in between), however, and can make such a decision independently with every new architecture that they make.
That said, with subsequent die shrinks, there may well be less of a need for GPU vendors to pick their trade-offs here. Performance is increasingly limited by power consumption rather than die space. A "typical" full node die shrink can increase your power use per mm^2 by about 40%, as it doubles the transistor count while only reducing power per transistor by about 40%. An ARM bigwig a while back publicly raised the possibility of "dark silicon" on future chips, that is to say, parts of the chip left completely unused because if you make the chip smaller, you can't have enough pads to get data to and from it.
That may make it possible for GPU vendors to put in all of the compute stuff they want, but power gate it off on GeForce and Radeon cards so as not to waste power, while still including all of the graphics stuff that they want. Larger caches as mentioned above don't burn much power. That may be a waste of die space on GeForce and Radeon cards, but if the alternative is dark silicon, so what?
Comments
I'm sure there are more people here with similar experiences.
I agree with you.. I ended up trading out 2 7870's for a single Nvidia 780 ti. Have not ever had as much trouble with graphics cards as I have had with ATI. And I have used several ATI cards throughout the years. After switching back to NVidia with the 780 several years back I won't switch back to ATI for any reason.
There's {a nVidia Fanboy} born every minute
I used to buy only nVidia cards largely because AMD/ATI's driver support and updates sucked. That was several years ago.
Really, though, whether you're a fan of one GPU vendor or another is off-topic. And discovering that one high end card works better than two lower end cards and blaming it on the maker of the lower end cards is further off topic. This thread is about trade-offs between graphics and compute.
The compute element is a wee bit of an esoteric issue for gamers as a result. While it's good for those that want to utilize the card's power for running complex processes or looking at ways to split performance with a CPU, if it doesn't affect graphical performance then it's out of sight and out of mind to those that only care for such.
On a personal end, I'd be happy to see better options for such implementation. The shrinking die size and capacity to gate access to parts of the chip depending on it's tasks would go a long way in helping it remain efficient while also having the compute power for complex rendering and processing when necessary.
"The knowledge of the theory of logic has no tendency whatever to make men good reasoners." - Thomas B. Macaulay
"The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge." - Daniel J. Boorstin
And for those who only need a gaming card, they get the dark silicone version, at a better price point.
If I'm getting what your saying, that is.
I endeavor to understand the thinking of those who have shaped our world, yet I lack the ability to insert my head, that far, up my ass.
I just read that Nvidia has problems supporting DX12 on Maxwell while AMD does not. . Async compute, whatever that is but supposedly gives AMD a edge. just read a little about it this morning. But supposedly it has the developers of Ashes of Singularity posting and blogging about it. A huge post on it over at overclock.net. This been talked about here?
http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1210#post_24357053
Indeed, it's a tribute to the increasing compute versatility of GPUs from both vendors that "older" GPUs mostly support DirectX 12 at launch, rather than having to make radical changes in silicon. That typically hasn't happened at all with older versions of the APIs, at least for major versions rather than minor steps.
I'm not trying to argue that gamers should care about GPU compute. Most probably shouldn't. I was trying to offer a more unified theory of what has happened in the GPU industry over the course of the last six years than merely "boo this vendor, yay that one". Yes, it's esoteric, but I also think it's interesting.
I don't see where you need 64-bit precision for anything there. I'm not saying you don't, but only that I don't see it.
Do remember that 64-bit is always going to carry a large performance hit as compared to 32-bit. For starters, you can only do half as many computations simply because that's all the data you can get from registers. Fiji has about 50 TB/s of register bandwidth (yes really, 50 terabytes), and lots of other GPUs are well into the tens of TB/s, so it's not like it's trivial to add more.
Some 64-bit computations manage to be half as fast as 32-bit. For example, a 64-bit xor is really just two 32-bit xors (which is how a GPU will handle them). But double-precision multiplication means you need a 52x52 multiply for the mantissa, as opposed to a mere 24x24 for single-precision. Put a huge focus on that and maybe you can get 1/4 of the computations of if you had put the focus on single-precision instead.
For graphical purposes, the GPU doesn't need global coordinates, so wanting to do things on a planetary scale doesn't mean you need 64-bit precision on the GPU. You can keep that data on the CPU and have a GPU handle things in a more local coordinate system where 32-bit precision is plenty.
People have said no reason to rush out and change card brands. Way to early for that. I just know if I just bought a 980TI, I'd be upset. Do they have DirectX 12 markings on the packaging? I just saw it and noticed people were getting stirred up. No mention of it here. Thought that I would ask.
No agenda. I use whatever. Have owned both brands.
Apart from that, I bought a 980 a while back. I didn't regret my purchase when the 980Ti was announced. I don't regret my purchase today. Maybe in 2-3 years when I get something that I want to play on DX12 and it actually starts bogging down I will regret it, but that's a bridge I'll cross if and when I get there, and even then, if I have to upgrade after several years of use, that's about what I had intended when I originally purchased it anyway. DX12 compatible games will be counted on one hand for the near-term future, and the future of DX12 really rests in the reception of those critical first wave of titles.
Just saying, a website full of people for which there's only a finite subset that even cares or understands your commentary might not be the optimal locale as a good bit of people will have it fly over their heads and give a response like ride and grunty hopping on the vendor bandwagon.
But for those of us that like reading it, thank you for posting. ^_^
"The knowledge of the theory of logic has no tendency whatever to make men good reasoners." - Thomas B. Macaulay
"The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge." - Daniel J. Boorstin
People in that thread even stated that more testing needed to be done. To be fair I haven't found the post but someone from AMD supposedly said that nobody fully supports DirectX 12 currently even though their cards seem to be handling it better according to that thread. Seems like a big deal for no one to be talking about. I guess people are just waiting to see.
Quzzical's posts are always interesting to read even if at times I don't understand half of what is said. Educating those who are ignorant of something is a worthwhile cause.
Sorry! >_<
"The knowledge of the theory of logic has no tendency whatever to make men good reasoners." - Thomas B. Macaulay
"The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge." - Daniel J. Boorstin
This is off topic but something to remember is that Ashes of Singularity was originaly a ATI Mantel focused game and they put a lot of time into the engine to support Mantel. So I take what they say about ATI and Nvidia with a grain of salt when just awhile back they had all there chips in one side at the expense of all else.
If you're inclined to cause trouble, it's not merely easy to write code that one vendor's GPUs will run massively faster than the other. It's easy to write code that one vendor's GPUs will run well, while the other's gives random junk rather than correct computations, or even crashes outright.
If DirectX 12 games start launching and show a very strong preference for one particular GPU vendor's architecture, that's a different story entirely. But let's see it in launched games, not early experiments, before we make a big fuss about it.
While there things that one GPU vendor handles very well and the other doesn't, it's generally because the other vendor never anticipated that you might want it for games. See bitcoin mining, for example:
http://www.hardocp.com/article/2011/07/13/bitcoin_mining_gpu_performance_comparison/2
That's got a single AMD GPU beating out the top end 3-way and 4-way SLI configurations of the day in an algorithm that trivially scales to arbitrarily many GPUs. AMD put a rotate instruction in their architecture and Nvidia didn't, so AMD won by ridiculous margins if you needed lots of rotate.
If games decide that they really need something that one GPU vendor included and the other didn't, then things could get interesting. I don't expect that to happen, though.
I'm familiar with bitcoin mining. Reason I ended up with a GTX 770. All the new AMD cards had released and were gone as soon as they hit the shelves. I did get a decent deal on the card because of holiday sales. I want to buy a top end card though next time I buy one. Other expenses (new manual transmission for my jeep.) and I guess I am going to wait and see what next Gen offers. Even though I have no idea how far away that is.
t0nyd nothing new from either side. Turning off features that don't work with your card.
Some games like AMD some like Nvidia.
To me AMD has allways been better at 64Bit double presicion floating point.
And the later Nvidia cards even had that limited so only the tesla card had it.
(witch is bs since its just the software that does it)
I own and used both side for computing power.
I have racks with 4 5870's and with 4 gtx570's some calculation run better on Nvidia some better on the AMD's
In gaming i found the biggest problems with AMD card's
Has somthing to do with Antialiassing or driver wise.
Still i have a r9 290x for gaming :pleased:
And i am writing this on a pc that has a 580 in it while looking at a rack with 2 GTX275 in it.
Standing on a rack with 2 5770's in it.
Some like Nike some like New Balance other like (Matrox) Adidias.
I buy a card when i wanna buy one, witch one dont realy care, i go for the one that gives me what i want at the time.
Sometimes its Nvidia sometimes its AMD.
Next year my 290X gets old and will get replaced with a HBM2 card, witch one i dont know.
AMD using coolermaster pump (witch is just a relabeled chinese pile of shit) on a Fury X sure as hell made me think about going Nvidia again.
In the first week i sold 13 FuryX from witch 9 ! returned the cards and took a 980TI back home due to loud pump noises.
My mini big ITX game cube :P