At the end of AMD's CPU presentation today, their CEO held up a video card and announced a few benchmarks on it. People immediately took those benchmarks and compared them to what the RTX 3080 is already known to get. In one case, it was a tie; in another, the RTX 3080 was a few percent faster. Add in that there's surely a bit of cherry-picking going on from AMD, and it looks like the top Radeon RX 6000 series GPU will be slightly slower than a GeForce RTX 3080.
If rumors are correct, the Radeon card will be considerably cheaper to build than the GeForce, and also have double the memory. That could make it a compelling value at $600. But that's just rumors.
And, of course, availability is a very important ability for any hardware to have. Depending on how quickly AMD can get cards to market in large volume, they might have a run of a few months where they have the fastest video card you can buy. Or they might not.
Comments
Everyone seems to agree that DLSS 1.0 was complete garbage. That's why it was dropped and Nvidia replaced it by DLSS 2.0.
Basically, put all of the RGB values computed for a frame into a big matrix A, as well as estimated values from previous frames assuming things would keep moving in the same direction as they were as given by motion vectors. Pick a reasonable matrix B of coefficient weights, which will involve most of the entries being 0 because the color of one pixel shouldn't depend on a sample computed from the other side of the screen. Compute the matrix product C = AB. C has the RGB values for every pixel in the output of DLSS 2.0. How data gets laid out actually isn't that important other than as an internal performance optimization, as that's just permuting rows and columns.
The key is picking the coefficient matrix B and then implementing a way to compute the matrix product efficiently. When most of the matrix is 0, you can skip most of the computations. If AMD were to pick a different but still reasonable matrix B, they could make something very much like DLSS 2.0 from the same data, even if it isn't exactly the same thing as DLSS 2.0.
Nvidia's tensor cores do allow you to compute C = AB more quickly, as doing that huge matrix multiply at low precision is what tensor cores are for. But it's not clear how much that matters, and if nothing else, you could get the same frame rate with probably only slightly degraded image quality just by picking a different matrix B that makes each output sample depend on fewer inputs. If AMD does a good job of it, I don't think Nvidia would be able to say, look how AMD's version of DLSS is worse than ours without implicitly pointing out the problems in Nvidia's, as Nvidia's version would look far more similar to AMD's version than to the proper screenshot without DLSS.
AMD's history is that when they release benchmarks saying this is how fast their new product will be, there was typically some mild cherry-picking going on that makes AMD's version look a few percent more favorable to their product than later, independent reviews. It's that company history here that I'm counting on to say, if AMD shows them tied, it probably means they're slightly slower.
AMD does not have Nvidia's history of releasing wild claims that are completely unsupported by evidence, such as Ampere offering 1.9x the energy efficiency of Turing. Nvidia does far more extreme cherry-picking, so if they announce results, I regard that as an upper bound, and it wouldn't be terribly surprising if their cards are only half as good as they claim.
https://www.techradar.com/news/amds-infinity-cache-could-be-big-navis-secret-weapon-to-beat-nvidias-rtx-3000-gpus
https://www.tomshardware.com/news/amds-infinity-cache-may-solve-big-navis-rumored-mediocre-memory-bandwidth
We shall see if it helps or not.
While GPUs do have L1 and L2 caches, they speculate that AMD's new approach is going to have L1 cache shared between multiple "cores". L1 cache is already shared by all of the shaders in a compute unit--at least 64 for everything to launch in the last 9 years, and often more than that. Furthermore, on some architectures (including GCN and Turing), L1 cache is already shared between multiple compute units. One could debate whether they're using "core" to mean "shader" (which Nvidia calls "CUDA cores") or "compute unit", or whether they have no clue that such things exist, but either way, they're wrong.
But the main caches that GPUs use are not L1 or L2 cache at all. They're registers and local/shared memory (AMD and Nvidia have different names for the same thing). Those don't implicitly cache global memory (the off-chip memory, such as GDDR6), though a programmer is likely to explicitly copy data from global into registers or local memory to cache it there. L2 cache is very high latency, and even L1 cache is pretty high latency--probably higher than a CPU's L3 cache, and possibly even higher than going to off-chip DRAM on a CPU. Registers and local memory are the lower latency caches. Registers are also typically the largest cache.
When it's not the same cache, L1 is much slower (higher latency and less bandwidth) than local memory because it doesn't matter. Its main use is to cache small portions of textures for graphics. It's also available to cache global memory for compute purposes, but really doesn't help much. It doesn't hurt anything, and GPU drivers will try to use it because it's there, but the main benefit is to occasionally make it somewhat less painful (but still quite bad) if the programmer does something stupid with global memory. And if a GPU's L2 cache is larger than the cumulative amount of L1 on the chip, even that benefit mostly goes away.
New Navi is some 10% below 3080 performance and is sold for $100 cheaper. AMD then announces that it is doubling down on crossfire and will continue to support it, including crossfire profiles for games and all that.
Wouldn't that be fun to watch, following the Nvidia's announcement (and the 3080 / 3090 power draw)?
I know they both have more or less moved away from SLI/ Crossfire, but AMD has not announced anything officially and could use it to one-up the green team.
Even if the average gains were only around 50%, there would be quite a few enthusiasts going for 2x6900XT for some $1300 - 1400.
All it needs is for AMD to say the magic words and keep crossfire alive...
If it were consistent in gains, then yeah, but I don't think even with DX12 mGPU it's consistent or transparent to the developer.
You have to buy off devs to get them to put it in their games. That always limited it to only working well in a few high profile releases.
1) It doesn't play nicely with rasterization. Alternate frame rendering works to keep your frame rate up provided that the host never tries to wait on anything in a frame, but it comes at the expense of latency problems. Trying to split a frame across two GPUs would force much of the work to be replicated on both, as you don't find out where a triangle is on the screen until four of the five programmable pipeline stages are done.
2) Single GPUs became fast enough for all but some weird corner cases. What's the point of buying two of a GPU when just one is fast enough? As it is, reviews commonly have to go out of their way to find games where the conclusion is something other than that all of the GPUs are so fast that you'll never notice a difference.
But those aren't true of ray-tracing. Ray-tracing can pretty trivially scale to multiple GPUs just by having each draw half of the frame. Or each of four GPUs draw a quarter of the frame. All of your model data will have to be replicated across all of the GPUs, but each GPU only needs to cast the rays for the pixels it is responsible for. The computational work doesn't get replicated across multiple GPUs like much of it would for rasterization.
Furthermore, single GPUs definitely aren't fast enough for full real-time ray-tracing. Even games targeting an RTX 2080 Ti mostly only used a little bit of ray-racing here and there. The only exception that I'm aware of off hand is Quake II RTX, which basically consists of adding ray-tracing to a low-polygon game released in 1997. If future games go heavy on ray-tracing, and one GPU can do 20 frames per second, two can do 40 frames per second, or three can do 60 frames per second, there's some real benefit to adding more GPUs.
That's a lot of assumptions in predicting a return in popularity of mGPU hardware anytime in the near future.
Your arguments are correct, but they have always been there, way before the SLI decline. Game dev support has always been spotty and many games did not support it at all. But even with those, SLI did provide a noticeable performance boost (except when it wouldn't run the game at all or would crash it). I've been using SLI / Crossfire since 2008 and 30-50% would be my estimate of the average performance boost in my games. I saw perhaps 2-3 games in which I had serious problems - like the game would not even start or would crash. Otherwise, it was always a clear performance gain, even in unsupported games.
If AMD takes such decision, for them it's simply a matter of continuing what they already do and keep Crossfire support in their drivers. The combination of right pricing, performance and power requirements would make this a viable business option, as a large part of enthusiasts would move from single 3080 to 2x6900XT for additional performance at acceptable cost (~$1300 - 1400, which would be well within the pre-Geforce 10xx generations of SLI / Crossfire - or a single 2080 Ti - prices).
If "2x6900XT" could suddenly start appearing on top of benchmarks and 3DMark scores, why wouldn't AMD do it? Again, it would be an easy business decision to go that way. Add to it targeted support to developers of big AAA titles to implement Crossfire support in their hit games and you turn a slight technical disadvantage (the assumed -10% performance vs 3080) into a good business opportunity.
The reason for the drop in SLI attractiveness was mostly the pricing on Nvidia's side and lack of real competition (what would be the point of buying two 590s or something similar just to get to a single 2080 Ti performance?)
There are extremely few people willing to shell out $2600+ for two cards. Nvidia's pricing and only marginal improvements in performance in previous generation(s) meant that SLI was pushed where Quad SLI had been before that.
RTX 3080 is about 5 times faster than GTX 680 from 2012, whereas I9 10900K gives only about 50% more gaming performance than I7 2700K from 2012.
GPUs have been the fastest developing part of our computers, and as a result most of the enthusiasts who wanted SLI back a decade ago are today happy with a single powerful GPU. The remaining few who still want to pay for more power are too small a group that anyone would want to do the support work for having a special technique not used by others just for them.
Multi-GPU for games is dying, and unless something changes it's unlikely to come back any time soon.
There were only really ever two markets for SLI/CF
- Those people where the fastest available GPU just wasn't fast enough, and have the means to afford nearly any cost to get there. That's a pretty small niche
- Those people that can't afford faster cards, but find that two lesser expensive cards outperformed the more expensive faster option. The conditions for this have occurred only a few times, as GPU manufacturers don't necessarily want to undercut themselves.
For the vast majority of the history of SLI/CF - the only place it made any sense was in chaining multiple of the very top tier available. In almost every case (with few expections), if you tried to SLI/CF to lower tiered cards, you were almost inevitably paying more to get worse performance than a faster available higher tiered GPU.