So there seems to be an improvement, but not as much as expected (~4 minutes, comparing the 8/32 core times). I'm also wondering what impact the Quadro on rendering performance has, since Terragen is purely (?) CPU-based? While rendering, the Quadro idles at about 11% usage.

I recall an example from a few years ago (and I hope I'm recalling correctly, since I couldn't find the original thread) were someone was trying to explain why a render did not finish any faster than it did. I think the example given was an image being rendered on a 12-thread machine, and the image was broken down into 16 parts. Each image part is assigned to a single thread. My assumption was that as each thread completed it's original image part, it would start rendering out one of the remaining 4 parts, and that all 12 threads would subdivide the remaining 4 parts among themselves. But what really happens is that only 1 thread gets assigned to one image part, so there will be at least 8 threads sitting idle while the other 4 threads completed their image part. The apparent effect would be that the rendering process had slowed down.

Hope I got that right. If not there are people here who are sure to jump in and correct me.

Norman, that's strange because I have that same setting on my Quadro driver and it fixes the performance issues. You can test some of the other "3D App" presets, I recall one or two others also having a positive effect, but 3D App - Game Dev seemed to work best for me. Since your machine is quite similar to mine I would suggest you test either with the default scene or the Terragen 3 benchmark so that you can know more precisely whether your render times match what is expected. With the benchmark your machine should finish in 4 minutes or under.

We're not sure why this Quadro setting affects render time, it's very odd. You're right of course that Terragen does not use GPU to render, so it's a bit of a mystery. Something to do with the way we use OpenGL or the way the Quadro drivers implement a particular function. It is hard to figure out for certain but I have found some evidence to suggest there are some broken/inefficient "auto-optimize" settings that Nvidia tries to use by default. So for example it may be polling the application many times per second to figure out whether it should use some particular optimization or not and the process of polling in itself slows the application down. That's not necessarily exactly what's happening but an example of something that might be related to the problem.

Thank you for your explanation and suggesting the benchmark, Oshyan. I've just rendered the Terragen 3 benchmark in 3:37 minutes. So I think the driver settings seem to be totally fine now since this time matches your score in the benchmark results. I've also had a closer look at thread usage while rendering. Masonspappy pointed out an interesting idea about thread allocation and I think this may contribute to the divergence in render time/cores usage. With my little test scene, the render starts with 100% usage of cores, but shortly after slows down to 70% and gradually lower, though there are still unrendered parts of the image. Strangely, I would expect some "idle" cores then but the lower usage spreads over all cores equally. But my expectation may be incorrect, depending on how threading is implemented.

Ok, now I'm sratching my head. I just disabled "Defer atmo/cloud" and rendered my test scene again, with all cores. Whoosh - 3:33 minutes (was 9:38 minutes with option on). Rendered again with only 8 cores - 5:20 minutes. So the overall render is much faster, but the timing difference between 8/32 cores equals my previous measures (factor of ~1,5).

The TG3 benchmark renders as follows:32 cores - 3:37 minutes8 cores - 7:03 minutes (factor ~2,0)So either the 32 cores are much slower than one would expect due to the guts of the Z820 or the 8 cores render incredibly fast. But I think my inital issue is solved - the more cores used, the faster.

Since disabling "Defer atmo/cloud" offers a great improvement in speed, are there any drawbacks when not using this option?

It sounds like your machine is rendering as expected now, so that's good to hear. Multithreaded performance does not scale strictly linearly (twice the threads does not equal twice the performance) due to various factors that we tend to refer to as "overhead" for multithreading. In the case of your 8 thread vs 32 thread test, however, I would say the likely biggest contributor is the fact that you have 16 physical cores, thus when rendering with 8, render calculation can be distributed entirely on the physical cores, which are much faster than hyperthreading resources. To compare fairly, I would suggest 8 vs. 16 threads, where you should see close to that 2x performance difference. Adding an additional 16 hyperthreading threads should only increase performance by a *maximum* of 20% due to known limitations of hyperthreading itself. The actual typical hyperthreading performance increase is usually more like 10%, and this can easily be lost to multithreading overhead when dealing with large numbers of threads (>16).

Defer Atmo/Cloud renders the atmosphere and clouds differently, with higher quality vs. when disabled. You can generally get equivalent performance with adjustment of antialiasing settings (which control the number of samples taken when rendering the atmosphere elements with Defer Atmo enabled). If you want very fast rendering, reduce AA to low levels (e.g. 2). You will generally get higher quality results with Defer Atmo, but you do need to get familiar with how to best tune the AA settings, including customizing the "First Sampling Level" and "Pixel Noise Threshold". There are some extensive discussions of all of that elsewhere on the forums.

Is there going to be a benchmark for Version 4 with the new clouds? Going by the emails I recieve from customers I think this is something that is very important. New clouds are great but with they require long rendertimes. This needs to be trouble shot and a standard benchmark scene can help everyone understand them better, by comparing results etc. I know things are busy back there. Perhaps in the near future?

Is there going to be a benchmark for Version 4 with the new clouds? Going by the emails I recieve from customers I think this is something that is very important. New clouds are great but with they require long rendertimes. This needs to be trouble shot and a standard benchmark scene can help everyone understand them better, by comparing results etc. I know things are busy back there. Perhaps in the near future?

I think we do not have to wait for PS to do this. Anybody can mock up a sky with the new cloud(s) and share it so people can render it and see the numbers. No?

We do intend to do a new benchmark specifically for TG4. It will likely include at least one free object (not necessarily new) to properly test the object rendering aspect, as well as v3 clouds. We don't have a timeline for delivery of the benchmark, but I would guess some time in the next couple months.

Wow, those are some incredibly impressive render times. They'd be 2nd on our list of the older TG3 benchmark times, bested only by a dual CPU, 14 core-per-CPU Xeon E5-2690 v4 machine, which of course is far, far more expensive than Threadripper. That's an absolutely amazing result for a single CPU machine. I'm looking forward to seeing how Threadripper does on the upcoming Terragen 4 benchmark too.

It's unfortunate that 2p (dual CPU) motherboards seem to be reserved for Epyc at the moment because it runs at significantly lower clock speeds, even in the 16 core model. Imagine a dual Threadripper machine...

Wow, those are some incredibly impressive render times. They'd be 2nd on our list of the older TG3 benchmark times, bested only by a dual CPU, 14 core-per-CPU Xeon E5-2690 v4 machine, which of course is far, far more expensive than Threadripper. That's an absolutely amazing result for a single CPU machine. I'm looking forward to seeing how Threadripper does on the upcoming Terragen 4 benchmark too.

It's unfortunate that 2p (dual CPU) motherboards seem to be reserved for Epyc at the moment because it runs at significantly lower clock speeds, even in the 16 core model. Imagine a dual Threadripper machine...

- Oshyan

Dual Threadripper....?

If only the cost of my liver, kidney & limbs would cover the cost of these.

Interestingly enough, I am on Windows 10 64-bit, and my score is slower than the same CPU running under Mac OS 10.9. I got a time or 4:22 (TG3 v3.4.00) but my RAM is only running at 1333MHz. I would be curious to see how much difference it would make to OC my RAM to 1600MHz. My rendering time in TG4 4.1.17 was 4:28.