[Scilab-users] More rapid calculation

[Scilab-users] More rapid calculation

I am using 6.00 for windows.
I am doing a simulation using a random number matrix with a huge size. The
size of the random matrix is 10 ^ 4 x 25000.
I am using a PC equipped with 8 cores and 16 threads.
It takes a considerable time to finish the simulation, but CPU utilization
is as low as 10-20%. Is there any way to increase the CPU usage and finish
the calculation sooner?
The loop is not used. I am using matrix functions.
I also considered parallel computing, but it is said that it can not be used
on windows.

Re: More rapid calculation

Le 14/02/2018 à 14:35, fujimoto2005 a écrit :
> I am using 6.00 for windows.
> I am doing a simulation using a random number matrix with a huge size. The
> size of the random matrix is 10 ^ 4 x 25000.
> I am using a PC equipped with 8 cores and 16 threads.
> It takes a considerable time to finish the simulation, but CPU utilization
> is as low as 10-20%. Is there any way to increase the CPU usage and finish
> the calculation sooner?

So may be it's a RAM issue. If you need a lot of intermediate memory and
you don't have it (here you need 2GB per copy, if you are using decimal
numbers), it usually goes to the disk space.. which is unbearably slow.

Re: More rapid calculation

A priori, there is no reason why your calculation should use more than
one CPU core, which explains why you see only 1/8=12,5% CPU use.

S.

Le 14/02/2018 à 14:35, fujimoto2005 a écrit :

> I am using 6.00 for windows.
> I am doing a simulation using a random number matrix with a huge size. The
> size of the random matrix is 10 ^ 4 x 25000.
> I am using a PC equipped with 8 cores and 16 threads.
> It takes a considerable time to finish the simulation, but CPU utilization
> is as low as 10-20%. Is there any way to increase the CPU usage and finish
> the calculation sooner?
> The loop is not used. I am using matrix functions.
> I also considered parallel computing, but it is said that it can not be used
> on windows.
>
> Best regards.
>
>
>
> --
> Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html> _______________________________________________
> users mailing list
> [hidden email]> http://lists.scilab.org/mailman/listinfo/users

Re: More rapid calculation

Hello,

If your problem is embarrassingly parallel (ie you run your simulations
many times independently for different random matrices), you might speed
up the overall simulation by running more than one instance of scilab in
parallel.

Antoine

Le 14/02/2018 à 18:00, Stéphane Mottelet a écrit :

> Hello,
>
> A priori, there is no reason why your calculation should use more than
> one CPU core, which explains why you see only 1/8=12,5% CPU use.
>
> S.
>
>
> Le 14/02/2018 à 14:35, fujimoto2005 a écrit :
>> I am using 6.00 for windows.
>> I am doing a simulation using a random number matrix with a huge
>> size. The
>> size of the random matrix is 10 ^ 4 x 25000.
>> I am using a PC equipped with 8 cores and 16 threads.
>> It takes a considerable time to finish the simulation, but CPU
>> utilization
>> is as low as 10-20%. Is there any way to increase the CPU usage and
>> finish
>> the calculation sooner?
>> The loop is not used. I am using matrix functions.
>> I also considered parallel computing, but it is said that it can not
>> be used
>> on windows.
>>
>> Best regards.
>>
>>
>>
>> --
>> Sent from:
>> http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html>> _______________________________________________
>> users mailing list
>> [hidden email]>> http://lists.scilab.org/mailman/listinfo/users>
> _______________________________________________
> users mailing list
> [hidden email]> http://lists.scilab.org/mailman/listinfo/users>

Re: More rapid calculation

Le 14/02/2018 à 19:15, [hidden email] a écrit :
>
> If your program does not take advantage of the MKL Intel library, it
> means that its CPU usage is not dominated by linear algebra stuff.
>
I was actually wondering that the bootle neck is not the CPU. This is
why i was thinking about the RAM.
But with 64 GB, this means that more than <32 copies are simultaneously
defined/reserved (internally and/or in the Scilab program).
Fujimoto2005, couldn't it be the case, in your Scilab code?
Are you cleaning properly all intermediate variables after usage?

Re: More rapid calculation

Dear all
Thank you for your replies.

I attached my code and a snapshot of task manager.
The snapshot shows a typical situation.
CPU utilization is usually between 10 and 20%.
There are about 3 times when the CPU utilization instantaneously reaches
from 40% to 50%.
The memory usage does not exceed 20 GB.
Always at least 40 GB is free.

In your code, most of the cpu time is spent between lines 40-54 (random
generation of big matrices), then between lines 54-60, where one of the
bottlenecks is the use of repmat (and you use it twice) and the
"cumsum".In previous posts of Heinz Nabielek related to code
optimization, you may have noticed that using matrix multiplication by a
vector of ones gives the same result BUT uses the BLAS ! For example,
compare these timings, with size(timePoints_V)=[1 25000] and sample=5000) :

Re: More rapid calculation

Dear Mottelet
Thank you for your useful advice.

1, By changing repmat(timePoints_V,2*sample,1) to
timePoints_M=ones(2*sample,1)*timePoints_V and using it, calculation time is
improved by 25 seconds.

2, "cumsum" is not a bottleneck because it takes only 2 seconds to finish.
Also, if I change cumsum(wY1_M,'c') to linear algebra version
wY1_M*triu(ones(time_step,time_step)), calculation time increases to 2
minutes although CPU usage rate rose greatly. "cumsum" function seems
efficient function.

> Le 15/02/2018 à 00:02, fujimoto2005 a écrit :
>> .../...
>
> Hello,
>
> In your code, most of the cpu time is spent between lines 40-54
> (random generation of big matrices), then between lines 54-60, where
> one of the bottlenecks is the use of repmat (and you use it twice) and
> the "cumsum".In previous posts of Heinz Nabielek related to code
> optimization, you may have noticed that using matrix multiplication by
> a vector of ones gives the same result BUT uses the BLAS ! For
> example, compare these timings, with size(timePoints_V)=[1 25000] and
> sample=5000) :
>
> tic;repmat(timePoints_V,2*sample,1);disp(toc())
>
> 12.372273
>
> tic;ones(2*sample,1)*timePoints_V;disp(toc())
>
> 1.823105
>
> On my machine (MacPro, OSX, Scilab 6.0.0), this last piece of code
> uses 100% cpu (four cores).

Thank you Stéphane for having pointed out the repmat() slowness.

Additional tests show that the Kronecker product is even slightly faster
than .*

It is a good initiative. Looking at your proposed code, I see that you
use "execstr" on strings which are forged on the fly. Although the
obtained expression will be faster (this was the goal), AFAIK such
constructs are not "compilable" the same way as the straight expression.
For the time being, Scilab does not use JIT compilation, but I think
that such constructs are typically not optimal and that it/then/else
constructs should be used instead.

S.

Le 16/02/2018 à 07:48, Samuel Gougeon a écrit :

> Le 15/02/2018 à 11:45, Stéphane Mottelet a écrit :
>> Le 15/02/2018 à 00:02, fujimoto2005 a écrit :
>>> .../...
>>
>> Hello,
>>
>> In your code, most of the cpu time is spent between lines 40-54
>> (random generation of big matrices), then between lines 54-60, where
>> one of the bottlenecks is the use of repmat (and you use it twice)
>> and the "cumsum".In previous posts of Heinz Nabielek related to code
>> optimization, you may have noticed that using matrix multiplication
>> by a vector of ones gives the same result BUT uses the BLAS ! For
>> example, compare these timings, with size(timePoints_V)=[1 25000] and
>> sample=5000) :
>>
>> tic;repmat(timePoints_V,2*sample,1);disp(toc())
>>
>> 12.372273
>>
>> tic;ones(2*sample,1)*timePoints_V;disp(toc())
>>
>> 1.823105
>>
>> On my machine (MacPro, OSX, Scilab 6.0.0), this last piece of code
>> uses 100% cpu (four cores).
>
> Thank you Stéphane for having pointed out the repmat() slowness.
>
> Additional tests show that the Kronecker product is even slightly
> faster than .*
>
> A new version of repmat() is proposed on review:
> https://codereview.scilab.org/19782> It is rewritten mainly using .*., which simplifies a lot the code.
>
> This version is more than 7x faster than the current one, and uses
> both CPU of my PC.
> It is roughly the ratio 12.37/1.82 ~ 6.8 that you give here-above.
>
> Best regards
> Samuel
>
> _______________________________________________
> users mailing list
> [hidden email]> http://lists.scilab.org/mailman/listinfo/users

It is a good initiative. Looking at your proposed code, I see that
you use "execstr" on strings which are forged on the fly. Although
the obtained expression will be faster (this was the goal), AFAIK
such constructs are not "compilable" the same way as the straight
expression. For the time being, Scilab does not use JIT
compilation, but I think that such constructs are typically not
optimal and that it/then/else constructs should be used instead.

Here, an if/then/else or rather select/case construct is not
possible, since the number of cases is unknown, open, not limited.
So the construct would anyway include a final else including
an execstr() instruction.

But, even if avoiding execstr() is not a priority, there is here
another solution, that is now implemented.

The final execstr() for the overloading routing can't be avoided.
This is typically the case of the processing of open unknown cases.