Posts by M_M

To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(

Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia.

My PC idle (i.e. ordinary desktop work, websurf etc) with 24" LCD is around 170W (100W in real idle with monitor sleeping).
With S&H running just on CPU, power draw is around 275W. (i7-2600k, overclocked to 4.5GHz)
With S&H running CPU + GTX1080, power draw is around 390W. So GTX1080 is responsible for around 115W draw, which is around 64% of its TDP, close to what I get as report from GPU-Z average power consumption.

5)
Message boards :
Number crunching :
GPU FLOPS: Theory vs Reality
(Message 1816323)
Posted 11 Sep 2016 by M_M
Post:
One another observation - In raw processing power (cores, but also nVidia declared TFlops) GTX1060 is basically "a half" of GTX1080, but it achieves here around 80% of its processing speed. Yet in games, it achieves of average just 60-65 % max, meaning that games are more easily taking advantage of high-end GPUs.

Also, Cr/Wh as calculated and presented here is rough picture since we have seen that actual TDP usage for different cards is different. GTX750Ti often goes above 80% TDP average usage, while for example GTX1080 with current application is below 65% of its TDP, regardless of CPU and number of GPU instances.

6)
Message boards :
Number crunching :
MB v8: CPU vs GPU (in terms of efficiency)
(Message 1815848)
Posted 9 Sep 2016 by M_M
Post:
Just to mention, if efficiency is a primary concern, undervolting and underclocking your GPU could significantly boost its power efficiency. For example, if you underclock your GPU by just 10% (and undervolt by another 10-15%, actually as much as you can to still keep it 100% stable), your GPU power usage will go down by 25-30%. This is the primary way how mobile GPUs are selected, testing them slightly on lower clock and much lower voltage.

On other hand, this means that overclocking (especially with overvolting) is significantly decreasing the power efficiency, which is nothing new but people usually overlook.

Worth mentioning is also that GPU apps are far away from their optimal efficiency, which is not so much case for CPU apps. For example, Petri33 custom optimized nV GPU application is 2-2.5x more efficient (and 2.5-3x faster) then standard app, and he is convinced there is still space for further improvement.

Reason is that it is much harder to properly optimize GPU applications, due to GPUs heavy parallelism and various architectures.

I can only hope that this means that the modern GPUs are merely under-utilized.

From discussion in this and another threads, I would say you are right... Seems that current applications cannot fully utilize modern GPUs. We have seen that Petri33 custom linux binary is about 2.5-3x more efficient then SoG, so obviously space for improvement exist (specially for new and high end GPUs).

Some patience is needed, but I have no doubt that it is just a matter of time when new optimized applications shall be available...

So effectively, nVidia for compute tasks (where it matters the most) is limiting memory bandwidth to lower then advertised 320GB/sec. Why, I don't know, this was never the case with 7x0 or earlier GPU series, but first seen recently on 9x0 (workaround using smi possible) and now on 10x0 (workaround not possible yet).

I am sure that you, Richard, Raistmer and others are putting efforts to improve applications to use this "spare" GPU potential...

11)
Message boards :
Number crunching :
Low RAC with GTX 1080
(Message 1810302)
Posted 19 Aug 2016 by M_M
Post:
i7-2600k is a 4core/8threads. I have experimented a bit and found out this as a optimal setting on my config for maximum RAC. Even though CPU load is shown as 100%, system (Win10) is reasonably responsive for normal work (surfing, office work etc).

If I set more then 50% CPU with 2 SoG in parallel, system is a bit less responsive and GPU WU times increases (visible also by lower GPU TDP, which is a better indication of GPU use then GPU load indicator itself). I have not experimented with CPU time usage limitation, so far it has been always set to 100%...

12)
Message boards :
Number crunching :
Low RAC with GTX 1080
(Message 1810299)
Posted 19 Aug 2016 by M_M
Post:
Guppies on my GTX1080 takes about 13-14min each (latest SoG r3500, running 2 in parallel ). On my i7-2600k and I have limited CPU usage to 50% to be able to feed the monster properly and yet have a reasonably responsive system to work on at same time.

So this is then a catch - most developers probably fully rely on generic GPU usage indication when optimizing their GPU code and judging if their code is squeezing maximum from GPU, but this is wrong as they should actually more rely on power consumption if they want to optimize code for maximum efficiency and performance...

16)
Message boards :
Number crunching :
GPU FLOPS: Theory vs Reality
(Message 1808446)
Posted 11 Aug 2016 by M_M
Post:
I am also a bit surprised that difference between GTX1080 and GTX1070 is so small, since GTX1080 has 33% more compute units (2560 vs 1920 shaders) and 25% faster memory (10GHz vs 8GHz), and its even a bit higher clocked, so something is holding GTX1080 back? Even nVidia was advertizing GTX1080 as 8.9 TFLOPS and GTX1070 as 6.5 TFLOPS.

I would guess that current application implementation is not using its extra resources well... Maybe time for some new, optimized application?

17)
Message boards :
Number crunching :
GPU FLOPS: Theory vs Reality
(Message 1808366)
Posted 10 Aug 2016 by M_M
Post:
As I know, it is float 32bit, i.e. single precision (SP), where nVidia is in general slightly faster in same price bracket. However, in DP AMD is usually faster as nVidia is "saving" DP performance for much more expensive dedicated compute cards like Tesla P100 for example.

However, R9 Fury/Fury X is still more powerful comparing to Ellemere RX470/480, which are new AMD performance/power efficient middle level GPU, with overall performance level similar to AMD Hawaii R9 290/290x.

Probably most of the improvements medium term will come from recent contributions from Petri33. Longer term probably looking at trying to leverage some of the ai targeted features not explored in setiathome code yet (longer term prospect).

Yes, I noticed efficiency of Petri33 crunching with his custom applications, 2-3x faster then stock applications. He is obviously very skilled programmer, so it is very good if he is willing to contribute to whole community.

For long term, I agree, we should always be open minded willing to explore new features and techniques to improve "the quest"