This all sounds reasonable to me. I agree -- now that the dramatic
performance difference has been explained and we're within a tiny
delta (that's likely simply the normal testing variance), let's put
the MQ stuff in 1.2.4.

(BTW, I didn't surmise that Pak had debugging enabled; I think he
said it somewhere :-) [perhaps on a trac ticket? I don't remember
offhand...])

On Sep 19, 2007, at 3:59 PM, Terry Dontje wrote:

> Nikolay and Community,
>
> Sorry to be so late in responding to your email but I've been working
> with Pak to determine whether my hasty decision as RM yesterday was
> hasty or not. To answer your question, we are still trying to
> determine
> if the message queue support can go in or not and the below is my
> perspective on whether we should.
>
> Community,
>
> A couple things have transpired in the last 24 hours from when we had
> our concall. As Jeff surmised earlier this morning Pak did
> accidentally
> have debugging enabled which did skew the numbers quite a bit. After
> making sure debugging was disabled for both builds (v1.2 and the tmp
> branch with the message queue fixes) we then fretted over the numbers.
> It looks to me that there is quite a bit of variance in the numbers
> that
> the OSU latency, IMB latency and mpi_ping all produce.
>
> For example in using the OSU latency tests we say the MX MTL have
> a .01
> us difference between v1.2 and the tmp branch (in favor of v1.2).
> However the mean, trimmed mean and median have about .02-07us
> difference
> (in favor of the tmp branch). To me the data looks pretty much the
> same
> and the fact that we are measuring the averages (ie none of the tests
> pick out the minimum value) makes these numbers even more hard to
> really
> nail down IMHO. I've essentially seen this affect for the other tests
> (IMB and mpi_ping).
>
> For the SM timings using the mpi_ping tests we have seen a range of
> average latencies from 1.47-1.5 us for both the tmp and v1.2 so they
> seem like moral equivalents to me. Rich Graham has led me to believe
> that he might get more consistent numbers but we are not able to
> and so
> I can only deduce that the numbers are essentially the same.
>
> In conclusion I believe both the CM PML (MX MTL) and the SM BTL
> performance is about the same between the tmp branch and v1.2.
> Because
> of this I would like to request that the 1097 cmr be put into
> 1.2.4. If
> others disagree with my assessment above I think a discussion will
> need
> to ensue and I would welcome further testing by others that may show
> that the changes have regressed performance (or not). I would like to
> set a timeout of 12 noon ET for others to comment whether these new
> findings puts our fears at ease. At that time if not descenting
> comments have been received I would like to ask Tim to pull in these
> changes and rebuild 1.2.4.
>
> thanks,
>
> --td
>
>
>
> Nikolay Piskun wrote:
>> Hi,
>>
>> Just to verify, before I'll start testing this, there will be no
>> message queue debugging support in this version, correct? This all
>> goes to 1.3 release.
>> Best Regards,
>>
>> P.S. It looks like it is time for us to be more formally involved in
>> this work.
>>
>> Nikolay Piskun
>> Director of Continuing Engineering, TotalView Technologies
>> 24 Prime Parkway, Natick, MA 01760
>> http://www.totalviewtech.com>> ---------------------------------------------------------------------
>> ---
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel