It looks like you're new here. If you want to get involved, click one of these buttons!
Just shy of seven years ago, Nvidia launched CUDA. No longer would video card performance just be about games, they said; soon we would offload other work onto the video card. And there was reason to believe that this could be useful: the GeForce 8800 GTX had 128 shaders; for comparison, the first quad core CPUs had just launched. Video cards had vastly more computational power available, and Nvidia was going to make it available to other programs without having to cram everything into a graphics pipeline.
If you've never used CUDA for anything, you're not alone. Today, all but the most die-hard fanboys have been disabused of the notion that CUDA would ever matter for consumer use. The problem is that all that GPU power was simply hard to put to good use for anything other than games. In order to see major gains from GPU compute, you needed algorithms that, among other things:
1) are embarrassingly parallel (i.e., can have an enormous number of threads operate independently with no knowledge of what is going on in other threads),
2) use almost exclusively 32-bit floating point operations,
3) are very, very SIMD-friendly,
4) can do many, many computations for each time accessing video memory,
5) didn't need to communicate with the CPU very much (relative to the amount of computations),
6) involved little to no branching, and
7) actually had a ton of computational work that needed to be done (because CUDA is pointless if something is very fast on a simple CPU).
That rules out just about every program that you've ever used. The one widely-used thing that fits all of them is the GPU portion of 3D rendering--i.e., games.
To be fair, there were some other things that could use CUDA. Such as synthetic benchmarks, or tech demos. Or stuff like Folding @home, which may offer some benefit to society but really didn't offer any benefit to the particular person running it. That stuff like these are the best examples of "consumer" uses of CUDA illustrates the problem.
Today, CUDA is basically obsolete for consumer use, too. It's restricted to Nvidia GPUs only, so if you do have an algorithm that is GPU-compute friendly and don't want to eliminate a large fraction of your customer base by going with a proprietary API, you use OpenCL instead.
That's not to say that CUDA is completely irrelevant. The proprietary nature of CUDA isn't a problem in supercomputers, where you can buy a zillion of card X and then write code that targets the particular card you bought. If you're building a supercomputer, you probably have some specific intended use for it and can pick the parts that best fit your specific use. But that's not something that random consumers do.
Ironically enough, in 2011, someone came up with a new class of program that random consumers could benefit from running and would satisfy all of the conditions above except for #2: bitcoin mining. And instead of #2, it could use almost exclusively 32-bit integer operations. (See, the title of this thread is actually relevant to the thread.)
Then people discovered that Radeon cards were good at integer operations and GeForce cards weren't. Or perhaps rather, Radeon cards were much better at integer operations than GeForce cards; all video cards were massively better at floating point operations than integer operations. More directly, people discovered that Radeon cards were vastly better at bitcoin mining than GeForce cards, even if people didn't realize that integer performance was the reason.
The reason for video cards to focus on floating point performance is that 3D graphics (i.e., what people buy video cards for) uses floating point operations for just about everything. Translations, rotations, lighting, and just about everything else that you can think of works wonderfully with floating point computations and badly or not at all with integer computations. Furthermore, some "small" integer computations can be done just as well with floating point data types: a 23-bit mantissa is enough to add, subtract, and multiply (but not divide!) integers up into the millions with no loss of precision when represented as floating point data types. While OpenGL (and probably also DirectX) makes available both floating point and integer data types, games use almost exclusively floating point.
The only graphical use for integer data that I've come up with is as a quick and dirty "random" number generator: take a random, unsigned integer in some interval, multiply it by a fixed, large, odd number (e.g., in the hundreds of millions), and then treat the output as though it is uniformly distributed in [0, 2^32). If it's not obvious why you would want to do that, that's kind of the point, though I've found it useful to do some geometry computations for particle effects.
But if lots of programs are going to be offloaded onto a GPU, then integer computations are going to matter. AMD is pushing for this, and is making architectural changes (such as having the CPU in the same chip as the GPU) to take #5 off of the requirements list above. Solid GPU integer performance strikes #2 off of the list, too. But Nvidia can't put the GPU in the same chip as the CPU apart from Tegra--and if you need to do intense computations, using a tablet or cell phone for it is doing it wrong.
So ironically enough, after Nvidia had pushed for GPU compute in the consumer space for years, the first consumer application that pushed people to buy video cards for GPU compute pushed people to buy exclusively AMD cards because it needed integer computations, not floating point. That has probably helped AMD's bottom line a bit, but it's still a long, long way from rivaling games as a reason to buy a video card.
But what about the future? Will GPU integer performance matter? Judging by current architectures, AMD is betting that it will and Nvidia is betting that it won't. To say that AMD is betting the company on this is a little too strong, but only a little: if GPU integer performance doesn't matter to consumers, then it's unlikely that heterogeneous computing will ever matter to consumers. And heterogeneous computing (i.e., having programs that traditionally would use only the CPU offload a lot of work to the GPU) is AMD's only real plan for having a CPU that can beat Intel's in the next several years.
But if AMD is right, then we could start seeing programs where a quad core AMD APU completely destroys Intel's best quad core CPU, if the AMD chip can efficiently have the GPU do the bulk of the work that Intel has to do on the CPU. I'd expect to see a trickle of such programs in coming years, but don't expect it to ever be all that widespread. Even so, AMD doesn't need for all or even most programs to see major gains from heterogeneous computing; the only programs that matter are those were some CPUs aren't good enough.
Comments
There is the reason you see Intel putting some semi-serious effort (and I still use that loosely) into their GPUs after years of being "barely passable". They see where AMD is going, and don't want to be too far behind the curve. They will only be able to lean on superior single-threaded CPU performance and their 1-generation fab process advantage for so long, and they (along with Microsoft) totally missed the ARM boat - although who aside from Apple could really have seen that one coming 10 years ago. Larabee, even though it never came to fruition, still has a lot of influence in Intel's strategy today.
That's also part of the reason nVidia jumped on Tegra - they couldn't do x86 (at least financially, they could technically do it)... and while Tegra parts today may be nettop/tablet level, they may not stay that way forever, especially as ARM starts to close the performance gap.
I know NVidia tried to get an x86 license before by purchasing VIA and pulling in Transmeta engineers, but I think that went sour because they would still need to get the x86 license extension and get the x86-64 license from AMD. Chances that AMD licenses x86-64 to NVidia is close to nil unless it really saves AMDs ass financially.
In response to GPUs with good integer performance. I wonder if we will start to see pointers to memory addresses for the GPU. That will be fun. I barely use integers, but I guess that's because I work mostly with graphics. I would love to use 64-bit floating points. There is a use for integers with graphics, you use them with GPU generated geometry; but that's usually still 1 integer for every 32 or so floats.
Also color information given in ints would probably not be as memory intensive as color information with floats. You really only need 8 bits per channel with ints, for a float you will get precision issues with 8 bits per float. However the hardware of GPU allows you to use 32-bit floats for each channel and push them through in the same time to push them as 8-bit ints.
The issue isn't storing data as integers; rather, it's doing integer computations, as in, 5 / 2 = 2, not 2.5. Color data probably is stored as 8-bit integers (for each of red, green, and blue), both in textures and in the frame buffer. But if you need to make a color 20% darker for lighting reasons, you're not doing integer multiplication to make that happen.
Pixel/fragment shaders used to use 16-bit floating point computations, as for color computations, that's usually enough. But with the transition to unified shaders and the addition of more shader stages, they decided that it was more efficient to make all shaders able to handle 32-bit floating point computations rather than having some physical shaders (meaning, transistors in silicon) that could only run in one particular shader stage. When you're doing geometry computations, 16-bit precision definitely isn't enough.
OpenGL does make 64-bit floating point numbers available, so if you need that precision, it's there. I don't know about DirectX, but my guess is that it does, too. The problem is that it's slow--typically 1/16 to 1/24 as fast as 32-bit floating point computations. Like integer arithmetic, AMD and Nvidia haven't made 64-bit floating point performance a priority outside of top end cards (which need it because they also have GPU compute duty in Tesla and FirePro cards), precisely because games don't use it much.
If you're trying to do GPU compute, then I could see why you'd want that.
But for graphical purposes, I don't see the need, unless it's problematic for even a single pixel to be wrong every once in a while. You can keep everything at 64-bit precision CPU side where you subtract to get where something is relative to the camera, then cast it to 32-bits before sending it along to the GPU.
quite interesting topic. I'm not used to reading such topics on this forums
you've mentioned bitcoin mining... But lately not even that is going with AMD, because there's alot of ASIC which are being built specifically for BC purpose and they give much more computing power for BC.
When I think of it now, I never really thought about it like that. Never really stopped and thought - what did CUDA bring to the table exactly?. You explained a few things I was wondering about.
Tho, you think nVidia doesn't have something planned in case it does go AMDs way? Or even other way around, going nvidias planned path. Because having nothing prepared means one of them is out of the game for good.
"Happiness is not a destination. It is a method of life."
-------------------------------
CUDA is great for encoding vidoes. I've used afew different bits of software that when running with CUDA enabled can decrease the time it takes to encode a 60 min video by about 3/4ths the time.
But like you have said. It's hard to put it to real use outside certain specific uses.
AMD sold a lot of cards for bitcoin mining in 2011. You're correct that bitcoin mining has since moved to FPGAs, and more recently, custom ASICs. More recently, AMD has sold a lot of video cards for similar cryptocurrencies such as litecoin. Those will presumably move to FPGAs and then custom ASICs as well if they don't completely vanish first, but AMD has made some money in the meantime.
Given a simple algorithm such as the SHA-2 hash used for bitcoins, you can always make a chip much better at that particular algorithm than if you buy an off-the-shelf general-purpose part and run that algorithm on it. The problem is that the custom ASIC is expensive to build and takes quite a while--probably years. Buying the off-the-shelf part is cheaper and much, much faster, which is why it was the best option available for a while.
-----
Nvidia could, if they wanted, make a chip with four ARM v8 "Denver" cores, four Kepler SMXes, a bunch of GDDR5 memory, and what not, and get impressive performance from it. The problem is, what would they do with such a chip? ARM means that it won't run on Windows. Apple isn't going to buy it for Mac OS X. That's going to use way too much power to put it in a tablet, let alone a cell phone. Consoles would be a natural market for the chip, except that AMD just locked up that market.
Nvidia could, if they chose, make future GPU architectures much better at integer performance. And they might well do exactly that. But there are always trade-offs, and if you're allocating more silicon to integer performance, that means less for floating-point performance, so your gaming performance suffers. That's a problem if gaming is the reason most people buy your products. It doesn't necessarily mean "card is terrible at gaming", but if quadrupling your integer performance to catch AMD comes at the expense of performing 3% worse in games, is that worthwhile for Nvidia? Maybe it is, or maybe not. But once Nvidia (or any other CPU or GPU vendor) decides to do something in an architecture, it takes about three years before it shows up at retail.
-----
Also, my threads in the hardware section are guaranteed to be at least 72% more interesting than average forum threads or your money back.*
* guarantee only applies if you paid no money
Do people really buy Nvidia cards specifically for video transcoding? Color me skeptical, for several reasons:
1) If you're buying a chip with video transcoding in mind, Intel CPUs have a dedicated hardware block that is much better at it than Nvidia GPUs using CUDA.
2) Video transcoding done by CUDA yields poor image quality, which makes it a complete non-starter for even many of the people who are really into it.
3) Video transcoding tasks tend not to be especially time sensitive. If it takes 10 minutes instead of 30, either way, you're not going to sit there and watch it run. You're going to start it and leave it alone while you go do something else.
While video transcoding is perhaps the most important consumer application of CUDA, that's basically the problem.
I don't know enough about coding to say whether it will matter on the consumer end or not.
Really, where would INT be preferable to FP? Lots of places I'm sure, but then we'd need to consider the other limitations inherent to the design. Really though, for something so old (6-7 years), I have my doubts that the efforts to explore GPGPU's potential at the consumer level is adequate to make any real judgement.
In the last 6-7 years that OpenCL (I'll ignore the proprietary languages) has existed PC stuff has been largely backburnered in favor of the smartphone/tablet apps. A simple matter of growth market vs a saturated one. However, you also have almost a decade of stagnation, even as general hardware gained the capacity to do things that were the domain of professional workstations the software itself became the barrier to entry for many activities. Video and audio editing were kind of slow to develop the right power vs ease of use to be viable in consumer space. As that ratio improves more people will do those kinds of activities. However, for a decade or more, whether it's out of routine or a self serving protectionism, the learning curve and ease of use didn't move for anything with reasonable capacities. That left a large group with minor motivations on the outside looking in, essentially we stopped doing new things. For all the capacity a newish computer has, we as consumers are largely doing the same stuff we could do on a typical smartphone/tablet or 10 year old PC. Other than the graphics, games are doing the same or less than they were a decade ago too.
Where does that leave us?
Well, PCs seem to have bottomed out. Hopefully, that floor will lead to more work that pushes those new abilities that have been IMO ignored and engages consumers into uses beyond solitaire, email and nut-shot videos. Any potential for GPGPU/ heterogeneous lies in those other uses first, then maybe in games if the cost for graphics stall (or even regress) and those funds are re-allotted to other areas. I don't know much about audio processing (integer performance needed) but TrueAudio's adoption could be an indicator.