Jeff Squyres wrote:
> ignored it whenever presenting competitive data. The 1,000,000th time I
> saw this, I gave up arguing that our competitors were not being fair and
> simply changed our defaults to always leave memory pinned for
> OpenFabrics-based networks.

Instead, you should have told them that caching memory registration is
unsafe and ask them why they don't care if their customers don't get the
right answer. And then you would follow up by asking if they actually
have a way to check that there is no data corruption. It's not really
FUD, it's tit for tat :-)

> 2. Even if you tag someone in public for not being fair, they always say
> the same thing, "Oh sorry, my mistake" (regardless of whether they
> actually forgot or did it intentionally). I told several competitors
> *many times* that they had to use leave_pinned, but in all public
> comparison numbers, they never did. Hence, they always looked better.

Looked better on what, micro-benchmarks ? The same micro-benchmarks that
have already been manipulated to death, like OSU using a stream-based
bandwidth test to hide the start-up overhead ? If the option improves
real applications at large, then it should be on by default and there is
no debate (users should never have to know about knobs). If it is only
for micro-benchmarks, stand your ground and do the right thing. It does
not do the community any good if MPI implementations are tuned for a
broken micro-benchmarks penis contest. If you want to play that game, at
least make your own micro-benchmarks.

Believe me, I know what it is to hear technical atrocities from these
marketing idiots. There is nothing you can do, they are payed to talk
and you are not. In the end, HPC gets what HPC deserves, people should
do their homework.

For applications at large, performance gains due to core-binding is
suspect. Memory-binding may have more spine, but the OS should already
be able do a good job with NUMA allocation and page migration.

> - The Linux scheduler does no/cannot optimize well for many HPC apps;
> binding definitely helps in many scenarios (not just benchmarks).

Then fix the Linux scheduler. Only the OS scheduler can do a meaningful
resource allocation, because it sees everything and you don't.