If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. Registration is $1 to post on this forum. To start viewing messages,
select the forum that you want to visit from the selection below.

Hey thanks for replying, but anything is nothing if you never did it before.

Hope you had a good Christmas,

Like I was thinking what GPU to put into it, I was thinking of putting a R9 295X2 that i have not being used, but i also have a gtx 960 4G also. What is your recommendation on what kind of gpu i should be thinking about.

I would also like to do a raid and somehow use Thunderbolt 3, if that is the most effective solution..

So its...

#1 Ram Solution, add second set of 4x4g ram or upgrade to 4x8 stix and only populate one full quad channel of ram.

#2 Which is the better gpu for crunching.
- GTX960 G1 Gaming 4G very nice card. I always liked it even though it was only so good,,,
or
XFX 295X2 8G Hydro

#3 Storage management. ways to use Thunderbolt 3, i want to build a server,
I will be putting the OS on a Intel Optane or a ssd, or both, this is all new tech to me... Plus i need a long term solution for storage.
I also have 5 - 256G SSD's that i want to put into a raid array for speed, For a few specific programs or even swap file.
I just want to get every once of speed out of this system, leaving no trench behind.

I will be using win 10 pro 64... unless someone can change my mind.....

Not only is this a Server Cruncher, but it's me learning more about server tech then overclocking tech.

So let me start by saying that right now, in my house, I have a total of somewhere around 86 cores in total running in a 64/8/8/6 configuration.

My main compute servers are now on a 100 Gbps Infiniband backbone network/interconnect (albeit currently switchless), and my two primary storage servers are both quad core, but they're ARM processors, rather than x86 processors (and both of those are also on a 10 Gbps switchless interconnect as well.

So, I'll address a few of your questions:

I can't speak to the Gigabyte board that you've got because it is a consumer or prosumer grade board rather than a straight server/workstation board, but between the processor, the RAM, and the board, you'll want to match the speeds.

With my system, because I have 512 GB of RAM total, spread acrossed four nodes (128 GB each, and even then, I STILL don't have enough - my swap file has hit a peak of around 290 GB), I am using eight sticks of 16 GB DDR3-1866 RAM, but because they're dual rank, and the board is quad channel, with two DIMMs installed per channel, it can only run at DDR3-1600 speeds (ECC Registered of course).

My point is that the processor-motherboard-memory needs to be lined up together for you to get the most use out of it. If something is working faster than it can otherwise handle, then you might not be able to take/make as much use of it as possible. Between going from DDR3-800 to DDR3-1600, my simulation work only sped up 20% despite a 100% increase in the clock speed/total available bandwidth. Therefore; if you have smaller increases in speed, you're not likely to really notice the difference much and it would be better to have a stable system that can run at full load, 24/7/365 rather than going for pure speed. Take it from someone who's burned up one of the cores on a Core i7-3930K (which I actually wrote about here) due to overclocking it in pursuit of speed. In the long run, it created more problems and just ultimately isn't worth it. (Also why I moved all my stuff over to server grade boards, Xeons, and RAM).

Some crunchers support Hyperthreading. Most will likely recommend that you actually disable Hyperthreading because crunching is fundamentally a computationally intensive task and Hyperthreading generally makes it such that it becomes a competition for computing resources from the CPU which may not necessarily yield enough benefit.

Your mileage may vary. You can test it and check the forums of the individual projects to see what their official recommendations are. I know that for Folding@Home, they recommend disabling HTT.

I'm not sure which GPU would be better for crunching. I think it will depend on which project. Others will have to chime in on it or you can just test it, if and/or when in doubt.

re: Optane
Some systems will see Optane as a drive. My NUC doesn't. So, it's just like a weird extension that bridges between HDD and RAM, but not quite like a bootable NVMe SSD. Personally, if you have 16 GB of RAM, Optane probably is going to be relatively useless to you, but again, you are, of course, more than welcome to play with it.

(I bought the NUC because it was an open box unit that was cheap and replaced another server that I had running that was running a low level FTP/web server - mostly to facilitate file/data transfers.)

re: SSDs on RAID0 (presumably)
Remember that most SSDs can outstrip the SATA 6 Gbps interface bandwidth, so you can put them in RAID, but with 5 drives, you'll hit the bandwidth limit rather quickly, beyond which, you're not going to see any greater performance improvements.

If you have the SSDs already, sure. If not, you'd probably be better off getting a NVMe PCIe SSD (either M.2 or as an AIC) so that it would be faster.

I have also found that Windows Server 2016 is faster than Windows 10 despite it being built off the same fundamental code base.

re: TB3
My understanding is that Thunderbolt is a switchless interface that can be daisy chained up to a certain limit.

So, for a storage server, if it is to be directly connected to something else, that'll probably work.

But if you want to make it so that lots of different devices can get to it simultaneously, then it might not work because of it being switchless.

As a 40 Gbps interface, it would be unlikely that you will be able to hit/get close to the peak bitrates, mostly because of how the interface works. (For example, unlike Infiniband, which supports remote direct memory access (RDMA), where you bypass the entire OS and kernel stack and go straight from application to NIC to NIC and back to application (if you are linking two computers together), it is my present understanding that TB in general, does route through the OS/kernel stack which adds latency, which slows things down.

Again, this is also coupled with the fact that unless you have PCIe SSDs running, the next bottle neck will be the SATA 6 Gbps interface on the SATA SSDs (which, you'll hit that bottleneck pretty quickly). It doesn't take much.

So you want to set up or sequence it so that the ratio of the slowest interface to the fastest interface is very close to 1.

Doesn't mean you can't. Just means that you might not get the benefits you might have been hoping/expecting, based on the theorectical bandwidth limits.

(I'm dealing with that now with my 100 Gbps (12.5 GB/s) 4x EDR Infiniband interconnect because I don't have NFSoRDMA or SMB Direct configured.

With Windows Server 2016, ramdisk to ramdisk transfers topped out at around 9 Gbps (through the OS/kernel stack) and in linux, repeating the same test, it topped out at 2.4 Gbps. But with RDMA, I was getting 97 Gbps in Windows Server 2016 and 96 Gbps in Linux which is pretty close to the 100 Gbps interface limit without having spent any time tuning the performance of it to get every ounce of bandwidth out of it. My applications, however, are running with RDMA and have substantially improved the internode performance vs. GbE.)

Now I found out. Thanks again for posting, taking your time and effort makes me happy, time is our most precious thing we have, so top Notch!!@!

This is all very new to me, my main focus with PC's in general is getting a nice overclock plus stability.
I also overclock for points so I know about popping chips, I am planning a massive 478/479 775 and 1366 winter overclocking, but i will talk about that more after this project.

I love the idea of balance, It makes sense to have it all tuned to the least common denominator or your just wasting extra horsepower that doesn't get translated to the output.

I am fine with no hyperthreading, if that is the best option, I like it then.

I do have sevel limiting factors personal and hardware wise that make some decisions for me. I will go into the hardware limitations.

My biggest problem has been getting started, i have a lot going on so i get a little down the road and focus on something else.
hence the false start. I have done this prob 3 or 4 times with Crunching so this time i am determined to do nothing else till
I have this rig up and running and crunching. Then I can worry about my server and the future projects...

I really waNT SOME PIE so i am laser focused to get it up and running with what i have on hand.

I will be putting the board in a casse tomorrow so it will be software time. To me the first month will be learning the lay of the land.
the lingo, the best configurations for crunching, i have seen some bigger and better hardware, but i have a X99 and everything right here so that is my starting point.

- OS Win 10 pro 64, ???? I have a key for this so i will start on win 10 ... but soon i will try to find a server key and upgrade with a m.2 or some other solution. ???

This is the starting build, it will be in the case and running tomorrow night, it is running now but on my bench.

So all that is left is what do I need to install to get it working this week, then we will / I will follow the advice of the pro's or just what you say...
To get it running smoothly and what should be my next upgrades.

To me that sounds like a plan.... it's cool to know that the first thing I will do system wise is Crunching in 2019!!! very cool!

@ alpha754293 - Thanks for taking your time and effort to give me that great overview. I kinda understood a lot of it but i
not going to pretend that I got it all.

I remember reading another reply you made a while ago, then i read your sig, lol
I said to myself, who is this guy,....

Now I found out. Thanks again for posting, taking your time and effort makes me happy, time is our most precious thing we have, so top Notch!!@!

This is all very new to me, my main focus with PC's in general is getting a nice overclock plus stability.
I also overclock for points so I know about popping chips, I am planning a massive 478/479 775 and 1366 winter overclocking, but i will talk about that more after this project.

I love the idea of balance, It makes sense to have it all tuned to the least common denominator or your just wasting extra horsepower that doesn't get translated to the output.

I am fine with no hyperthreading, if that is the best option, I like it then.

I do have sevel limiting factors personal and hardware wise that make some decisions for me. I will go into the hardware limitations.

My biggest problem has been getting started, i have a lot going on so i get a little down the road and focus on something else.
hence the false start. I have done this prob 3 or 4 times with Crunching so this time i am determined to do nothing else till
I have this rig up and running and crunching. Then I can worry about my server and the future projects...

I really waNT SOME PIE so i am laser focused to get it up and running with what i have on hand.

I will be putting the board in a casse tomorrow so it will be software time. To me the first month will be learning the lay of the land.
the lingo, the best configurations for crunching, i have seen some bigger and better hardware, but i have a X99 and everything right here so that is my starting point.

- OS Win 10 pro 64, ???? I have a key for this so i will start on win 10 ... but soon i will try to find a server key and upgrade with a m.2 or some other solution. ???

This is the starting build, it will be in the case and running tomorrow night, it is running now but on my bench.

So all that is left is what do I need to install to get it working this week, then we will / I will follow the advice of the pro's or just what you say...
To get it running smoothly and what should be my next upgrades.

To me that sounds like a plan.... it's cool to know that the first thing I will do system wise is Crunching in 2019!!! very cool!

seems like i have a LOT to work towards, but i love projects like this, that i can expand my horizons and knowledge and do what i love most. Mess with hardware..

Thanks again @ alpha754293 May you and your family have Health and Happiness in 2019

All the best to everyone!
-steve
SystemViper

Steve - you're very welcome. And same to you - Happy Holidays and a Happy New Year to you and your family as well. I wish you health, fortune, and happiness for 2019.

(Yeah, I've been on this forum for quite some time although I've also been inactive here also for quite some time as well. As life would have it, been busy. Family. Kid. Work. Same 'ol, same 'ol.)

My experience with overclocking (see my thread here asking about overclocking the Core i7-3930K from the stock speed of 3.2 GHz to 4.5 GHz 24/7 and then subsequently killing at least one out of the six cores on account of having done so) taught me that overclocking is great for short term gains, but long term, I don't recommend doing it.

It'd be different if we were made of money so we can just replace hardware whenever it fails, but that's usually and generally not the case. At least for me, it wasn't. I'm still running 3930Ks, 7 or so years after that processor has launched. My daily driver now was originally built in 2011. And my "new" hardware are all pretty much off eBay. (Which can be GREAT!)

So given that, longetivity is now my top priority above overclocking because I can't keep dumping/pumping in money for my stupidity.

Hence, I'm not an advocate for overclocking. It's great for setting world records and short term runs, but the stuff that I do (computational fluid dynamics) - a single run can last 42 days. Straight. So...a hardware failure during that time is VERY, VERY bad.

My take on overclocking is that it's great for super short runs, but when you're working on the kind of stuff that I'm working on, it ends up, ultimately being to the detriment, so I don't bother with it.

I speed up the parts that I can within the limits of the hardware and try and make it run within those constraints/limitations.

re: Hyperthreading
Again, YMMV.

Some programs will show upto about a 7% performance increase. Most, on average is 0% difference, +/- about 2 to 2.5% difference. So...it's not great. But it's also not quite as bad as when HyperThreading first launched either, where we were seeing differences of upto -10+%.

I typically don't run with Hyperthreading enabled on any of my systems because trying to manage the processor affinities is a giant pain and not particularly worth the effort. (Sidenote though: if you do go through the steps of assigning processor affinities, that CAN help speed things up with, again, varying results because it prevents data from core migration, but again, having to set it and reset it every time is a pain, so I generally don't really bother with it.)

re: RAM
The greater the AMOUNT of RAM is, it is usually and generally better. The only times when it isn't are:

1) When a specific configuration of RAM forces SPD to run at suboptimal speeds (e.g. quad ranked DIMMs forcing the SPD to run at DDR3-800 instead of DDR3-1600).

2) Mixed or mismatch type/speed/timings, which causes everything to run slower, due in part to synchronization issues.

So if you have 4x 8GB available, I'd recommend using that. It is unlikely that your cruncher will actually really need all of that, but depending on what you're running for your file server, you might. (e.g. ZFS is notorariously RAM heavy due to its background RAID scrubbing abilities.)

re: SSD
Samsung 960 Pro is fine.

Really, most SSDs will be plenty. Most crunchers (with a few minor exceptions, and sometimes only within the project's beta testing) will you get a large volume of data. Usually, each work unit will be small as given by the principle of distributed parallel computing. Even with "large" data, because you have to send the results back to their server, and since they can't and don't know the speed of your connection, they have to make it so that it can't be too big otherwise you'd spend more time uploading the results than you might otherwise spend crunching it. So, the project teams have developed it such that it balances this.

(I was on the Folding@Home beta team for quite some time because for a time, I had advanced hardware relative to the peers, so I was able to test bigger work units for them. Now, they're GPU dominant, and most of my stuff isn't always GPGPU capable, so it has limited how much I can contribute with just pure CPU-based hardware.)

re: GPU
Again, someone else who is more knowledgable in this area might be able to speak to it more, but it will also be project dependent.

The better metric that I tend to use in terms of relative comparison is floating point operations per second (FLOPS) for data types of single and/or double precision.

The higher the number in those metrics, the better. Some GPU crunchers use double precision, so you want a card that has high double precision performance. Most other crunchers will be single precision (for cancer type stuff, as it was suggested). But if you do any of the distributed AI computing, then they're now using basically 4x4 half-precision FMA (e.g. read: "tensor cores") so that's becoming a new metric to look out for if you're looking to get into that game (because more traditional single precision cores CAN execute 4x4 half-precision FMA, but SIGNIFICANTLY slower because the tensor cores is specialized hardware with the expressed intent for this class of problems.)

re: OS
If that's what you have, go with that.

re: everything else
Please don't take it as something NOT to do only because you might have limited benefits for your goals in mind.

Messing with hardware can be fun, but it can also be EXTREMELY frustrating (as I found out when I started messing with my Infiniband network adapters. Turned out that one of the four adapters that I had ordered was DOA).

But I will also say this - IF this is going to pull double duty with it being a cruncher and a file server, if you don't have some kind of CPU offload (e.g. TCP offload onto the network interface card), you can potentially see that transfers to and from this server can be very slow when it is crunching because the network has to go through (usually) core 0 of your CPU. So, if that is busy crunching, data will only trickle in and out. So just keep/bear that in mind.

I used to have it where my file servers would crunch too. Except that my file servers were designed with low power, slow processors in mind since they're just dummy servers, and transfers would slow down to asynchronous speeds (e.g. ~5 MB/s on a gigabit ethernet network, which should be capable of about 116 MB/s peak.) The moment I turned off the cruncher, the transfer speeds resumed. The moment I turned the cruncher back on, it slowed back down again.

So I just kept it off and make something else do the crunching and let the dummy file server be the dummy file server.

(Course, now that I am using NAS appliances, I've completely offloaded the file server task onto an entirely different device altogether.)

But I wanted to bring this to your attention as something to keep in mind because this can and is likely going to happen unless you put in mitigation and management tools and hardware to deal with this, otherwise, again, things might not turn out the way you might have otherwise thought/hoped it would.

Thanks.

*edit*
re: "I kinda understood a lot of it but i not going to pretend that I got it all."
a) Story of my life.

b) It's okay to not know things. Learning is fun IMO.

c) When in doubt, ask. There are very few stupid questions, and the stupidest ones of all are the ones that aren't asked.

Some people here might know me a little bit to know that I've been doing this stuff for quite a long time, with 99% of it out of sheer necessity. So...

And there will also always be people who know more about this stuff than even I do. The difference (usually) is that's their job to know. I do it because I have/had to to support the needs of other activities that I am or was presently working on. (Which I've scared and surprised a few sysadmins on account of that before.)

re: "(The QPI on my server (with first gen E5-2690 processors) is capable of 32 GB/s (256 Gbps) vs. SATA 6Gb s vs. PCIe 3.0 x16 15.75 GB/s (128 Gbps) link.
seems like i have a LOT to work towards, but i love projects like this, that i can expand my horizons and knowledge and do what i love most. Mess with hardware.."

Actually, your system might have a faster main bus than my systems do. But it also depends on how well you can make use of it. I doubt that any distributing computing project is going to be nearly as strenuous as the stuff that I am running myself for my own simulation stuff, so it's not likely that any of that would be limiting factors to crunching performance. Distributed computing projects tend to NOT be disk I/O heavy, so I wouldn't really worry about that too much.

Again, the stuff that I normally run is more along the lines of "real" HPC stuff, so the demands are greater. Distributed computing is meant to be broken down so that the average computer can perform those tasks.