Hi everyone,
As I was trying to run a calculation, I just realised my computer only has 16GB of RAM.
I have 2x
https://www.amazon.co.uk/HyperX-HX430C15PB3-Predator-288-Pin-Memory/dp/B071ZZCSQZ/I would like to upgrade, to make sure I can run all of the calculations comfortably - this means at least 32GB.
Is the best strategy to buy 2x more of the same? That would be 4 x 8GB DDR4?
Is there any throttling when you fill all 4 slots?
Also, is it true that the RAM performs better if you stick with one model, as opposed to mixing different ones?
Comments
Assuming you have a mainstream consumer CPU and not a HEDT version, your CPU will have two 64-bit memory channels. It can use each with a 64-bit connection to one memory module or two separate 32-bit connections to two separate modules. Filling all four slots means that you get the same bandwidth as before, but just divided evenly among more modules.
Using more memory modules places more stress on the memory controller, however. This commonly means that it can't clock as high as before. Depending on how much headroom you have, you might be able to run four modules at 3000 MHz (which is already overclocking, incidentally), or maybe you'll have to dial back the clock speeds a ways to make it stable.
Mixing modules of different capacities will hurt your bandwidth considerably. But so long as all modules have the same capacity, the performance hit to mixing different modules is much, much smaller. At best, if some modules can handle higher clock speeds or tighter timings than others, you'll have to run all modules at the speed of the slowest.
Memory isn't a case of one module being systematically faster than another in all ways, however. If one module can handle lower values on one timing and another module tighter values on a different timing, to mix them means that you have to choose timings that all of the modules can handle--which will be looser in some ways than what any module on its own could do. In principle, this doesn't have to be all that bad, but you're not going to have good luck with trying to auto-detect XMP profiles if different modules have different profiles.
Hell even Bill Gates said you’ll never need more than 640kB. And he’s smart.
I believe the calculation scans over a large pool of text, looking for patterns and if it has seen the specific pattern before, it adds +1 to that pattern's entry in a data frame. As you process more text, you see more unique patterns, so the table grows a fair bit. Basically it's building up a huge table and then exports it at the end.
I'm not sure it's viable to use anything but RAM, since you need to constantly be checking if your values exist and inserting/modifying them.
Now that I've been thinking about it though, I wonder if it would be possible to batch this process. Exporting the tables and flushing the memory. And ultimately writing something that merges all of it together.
Having only two memory sticks if you've got dual channel motberboard is slightly better, but it's normally not that important.
To put it into context, it’s the entire King James Bible, nearly 4,000 times over.
Take a text that size and cut it up into sub patterns and yeah, you could get some big tables. But there is also a good deal of room for optimization too. And I don’t know exactly what you are doing, it just seems a bit suspect from my 4,000 mile away perspective.
I will admit, sometimes it’s cheaper to just throw hardware at the problem. And sometimes there is no other good answer than to throw hardware at the problem.
Regardless, to answer your original question: RAM brands and speeds don’t have to match, it will all run at the lowest common denominator. But it eliminates sometimes quirky compatibility problems if you are able to.
5 years ago 8 GB was enough. Buying memory today isn't the same as buying memory 5 years ago. You can reasonably place higher clocked memory in all 4 dimms without any adverse affects.
Thanks for the tips. I'll definitely look into Pentaho.
I started working with large data sets 3 years ago and it's been a wild ride. This current project is probably the trickiest so far. We started with a 800GB dataset, with the goal of real time visualisation. It's now processed into 60GB.
I work at a Psychology department at university, which means the teams are very small. Most people around me know the theory, I have to do all the implementation. Had to do all of the analytical back end, the server-database communication and all of the client side.
The deadline is tomorrow and I can't wait to wrap this up.