I have noticed the following strange (at least for me) performance behavior with one of my programs. It is a program that does some scientific computations and while trying to optimize it I noticed that the code from case_B below executes faster (almost twice as fast) as the code in case_A. This is a bit counterintuitive for me, since in case _B there is also the cost of the function call (or should be the same if the function is inlined).
Can anybody shed some light on why it's behaving this way?
case_A:
-------------------------------
foreach (i,index; indices) {
foreach (k, inout value; centers[belongs_to[i]])
value += vecs[index][k];
}
----------------------------------
case_B:
-------------------------------
void addTo(T,U)(T[] a, U[] b) {
foreach(index, inout value; a) {
value += b[index];
}
}
....
foreach (i,index; indices) {
addTo(centers[belongs_to[i]],vecs[index]);
}
_______________________________
Marius

Marius Muja wrote:
> I have noticed the following strange (at least for me) performance behavior with one of my programs. It is a program that does some scientific computations and while trying to optimize it I noticed that the code from case_B below executes faster (almost twice as fast) as the code in case_A. This is a bit counterintuitive for me, since in case _B there is also the cost of the function call (or should be the same if the function is inlined).
> Can anybody shed some light on why it's behaving this way?
> > case_A:
> -------------------------------> foreach (i,index; indices) {
> foreach (k, inout value; centers[belongs_to[i]])
> value += vecs[index][k];
> }
> ----------------------------------> > case_B:
> -------------------------------> void addTo(T,U)(T[] a, U[] b) {
> foreach(index, inout value; a) {
> value += b[index];
> }
> }
> ....
> foreach (i,index; indices) {
> addTo(centers[belongs_to[i]],vecs[index]);> }
> _______________________________
My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one.
If so then this should be just as fast as case_B:
foreach (i,index; indices) {
auto vecs_i = vecs[index];
foreach (k, inout value; centers[belongs_to[i]])
value += vecs_i[k];
}
--bb

Bill Baxter wrote:
> Marius Muja wrote:
>> I have noticed the following strange (at least for me) performance behavior with one of my programs. It is a program that does some scientific computations and while trying to optimize it I noticed that the code from case_B below executes faster (almost twice as fast) as the code in case_A. This is a bit counterintuitive for me, since in case _B there is also the cost of the function call (or should be the same if the function is inlined).
>> Can anybody shed some light on why it's behaving this way?
>>>> case_A:
>> ------------------------------->> foreach (i,index; indices) {
>> foreach (k, inout value; centers[belongs_to[i]])
>> value += vecs[index][k];
>> }
>> ---------------------------------->>>> case_B:
>> ------------------------------->> void addTo(T,U)(T[] a, U[] b) {
>> foreach(index, inout value; a) {
>> value += b[index];
>> }
>> }
>> ....
>> foreach (i,index; indices) {
>> addTo(centers[belongs_to[i]],vecs[index]);>> }
>> _______________________________> > My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one.
> If so then this should be just as fast as case_B:
> > foreach (i,index; indices) {
> auto vecs_i = vecs[index];
> foreach (k, inout value; centers[belongs_to[i]])
> value += vecs_i[k];
> }
> Forgot the footnote!:
[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'.
--bb

Bill Baxter wrote:
> Bill Baxter wrote:
>> Marius Muja wrote:
>>> I have noticed the following strange (at least for me) performance behavior with one of my programs. It is a program that does some scientific computations and while trying to optimize it I noticed that the code from case_B below executes faster (almost twice as fast) as the code in case_A. This is a bit counterintuitive for me, since in case _B there is also the cost of the function call (or should be the same if the function is inlined).
>>> Can anybody shed some light on why it's behaving this way?
>>>>>> case_A:
>>> ------------------------------->>> foreach (i,index; indices) {
>>> foreach (k, inout value; centers[belongs_to[i]])
>>> value += vecs[index][k];
>>> }
>>> ---------------------------------->>>>>> case_B:
>>> ------------------------------->>> void addTo(T,U)(T[] a, U[] b) {
>>> foreach(index, inout value; a) {
>>> value += b[index];
>>> }
>>> }
>>> ....
>>> foreach (i,index; indices) {
>>> addTo(centers[belongs_to[i]],vecs[index]);>>> }
>>> _______________________________>>>> My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B:
>>>> foreach (i,index; indices) {
>> auto vecs_i = vecs[index];
>> foreach (k, inout value; centers[belongs_to[i]])
>> value += vecs_i[k];
>> }
>>> Forgot the footnote!:
> > [*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'.
> > --bb
Yes, that was it! Thanks for clearing this up for me.

Bruce Adams wrote:
> Bill Baxter Wrote:
> >> Bill Baxter wrote:
>>> Marius Muja wrote:
>>>> I have noticed the following strange (at least for me) performance behavior with one of my programs. It is a program that does some scientific computations and while trying to optimize it I noticed that the code from case_B below executes faster (almost twice as fast) as the code in case_A. This is a bit counterintuitive for me, since in case _B there is also the cost of the function call (or should be the same if the function is inlined).
>>>> Can anybody shed some light on why it's behaving this way?
>>>>>>>> case_A:
>>>> ------------------------------->>>> foreach (i,index; indices) {
>>>> foreach (k, inout value; centers[belongs_to[i]])
>>>> value += vecs[index][k];
>>>> }
>>>> ---------------------------------->>>>>>>> case_B:
>>>> ------------------------------->>>> void addTo(T,U)(T[] a, U[] b) {
>>>> foreach(index, inout value; a) {
>>>> value += b[index];
>>>> }
>>>> }
>>>> ....
>>>> foreach (i,index; indices) {
>>>> addTo(centers[belongs_to[i]],vecs[index]);>>>> }
>>>> _______________________________>>> My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one.
>>> If so then this should be just as fast as case_B:
>>>>>> foreach (i,index; indices) {
>>> auto vecs_i = vecs[index];
>>> foreach (k, inout value; centers[belongs_to[i]])
>>> value += vecs_i[k];
>>> }
>>>>> Forgot the footnote!:
>>>> [*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'.
>>>> --bb
> > <shields up>
> This is exactly the sort of thing where const would help :)
> </shields down>
Uh oh... you should have left the shields up. Now you're unprotected...
--bb

"Bruce Adams" <tortoise_74@yeah.who.co.uk> wrote in message news:fct9gb$uhd$1@digitalmars.com...> I was invariant but now I'm just const but you still can't modify me through this interface :p
*(cast(void*)&BruceAdams) = 0;
Helloooo, void*.

Bruce Adams wrote:
> Bill Baxter Wrote:
> >> Bruce Adams wrote:
>>> Bill Baxter Wrote:
>>>>>>> Bill Baxter wrote:
>>>>> Marius Muja wrote:
>>>>>> I have noticed the following strange (at least for me) performance behavior with one of my programs. It is a program that does some scientific computations and while trying to optimize it I noticed that the code from case_B below executes faster (almost twice as fast) as the code in case_A. This is a bit counterintuitive for me, since in case _B there is also the cost of the function call (or should be the same if the function is inlined).
>>>>>> Can anybody shed some light on why it's behaving this way?
>>>>>>>>>>>> case_A:
>>>>>> ------------------------------->>>>>> foreach (i,index; indices) {
>>>>>> foreach (k, inout value; centers[belongs_to[i]])
>>>>>> value += vecs[index][k];
>>>>>> }
>>>>>> ---------------------------------->>>>>>>>>>>> case_B:
>>>>>> ------------------------------->>>>>> void addTo(T,U)(T[] a, U[] b) {
>>>>>> foreach(index, inout value; a) {
>>>>>> value += b[index];
>>>>>> }
>>>>>> }
>>>>>> ....
>>>>>> foreach (i,index; indices) {
>>>>>> addTo(centers[belongs_to[i]],vecs[index]);>>>>>> }
>>>>>> _______________________________>>>>> My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one.
>>>>> If so then this should be just as fast as case_B:
>>>>>>>>>> foreach (i,index; indices) {
>>>>> auto vecs_i = vecs[index];
>>>>> foreach (k, inout value; centers[belongs_to[i]])
>>>>> value += vecs_i[k];
>>>>> }
>>>>>>>>> Forgot the footnote!:
>>>>>>>> [*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'.
>>>>>>>> --bb
>>> <shields up>
>>> This is exactly the sort of thing where const would help :)
>>> </shields down>
>> Uh oh... you should have left the shields up. Now you're unprotected...
>>>>>> --bb
> > I was invariant but now I'm just const but you still can't modify me through this interface :p
Ooh, you better watch out. Somebody's going to put you in a cast for making such volatile statements. But still, if it's a pure literal expression of how you feel, then I won't take exception.
ugh. :-P
--bb