I'll definitely plan an upgrade to the latest LSF release (7.0 update 2),
then. Given the roadmap, I think I'm way better off forging ahead with MPI
on LSF than implementing a separate solution. I didn't really expect
production-ready code at this point. Just checking whether it was still
planned for 1.3, really (the last thing I saw in the mailing lists was fairly
discouraging).

I'm willing to dedicate some time to testing code if you think it would be
helpful.

Cheers,
Eric

Jeff Squyres wrote:
> There are two issues:
>
> - You must have a recent enough version of LSF. I'm afraid I don't
> remember the LSF version number offhand, but we both (OMPI and LSF)
> had to make some changes/fixes to achieve compatibility.
>
> - LSF compatibility in OMPI is scheduled for v1.3 (i.e., it doesn't
> exist in the v1.2 series). As Ralph indicated, we're aware that it's
> currently broken in the trunk -- it'll be fixed by the v1.3 release,
> but I don't know exactly when. To be blunt: I wouldn't count on it in
> a production environment until v1.3 is officially released. Betas may
> become available before v1.3 goes gold that would be suitable for
> testing, though.
>
> Here's the OMPI v1.3 roadmap document -- it's more-or-less continually
> updated:
>
> https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3>
>
> On Feb 11, 2008, at 10:36 PM, Ralph Castain wrote:
>
>> Jeff and I chatted about this today, in fact. We know the LSF
>> support is
>> borked, but neither of us had time right now to fix it. We plan to
>> do so,
>> though, before the 1.3 release - just can't promise when.
>>
>> Ralph
>>
>>
>>
>> On 2/11/08 8:00 AM, "Eric Jones" <ejon_at_[hidden]> wrote:
>>
>>> Greetings, MPI mavens,
>>>
>>> Perhaps this belongs on users@, but since it's about development
>>> status
>>> I thought I start here. I've fairly recently gotten involved in
>>> getting
>>> an MPI environment configured for our institute. We have an existing
>>> LSF cluster because most of our work is more High-Throughput than
>>> High-Performance, so if I can use LSF to underlie our MPI
>>> environment,
>>> that'd be administratively easiest.
>>>
>>> I tried to compile the LSF support in the public SVN repo and
>>> noticed it
>>> was, er, broken. I'll include the trivial changes we made below.
>>> But
>>> the behavior is still fairly unpredictable, mostly involving mpirun
>>> never spinning up daemons on other nodes.
>>>
>>> I saw mention that work was being suspended on LSF support pending
>>> technical improvements on the LSF side (mentioning that Platform had
>>> provided a patch or try.)
>>>
>>> Can I assume, based on the inactivity in the repo, that Platform
>>> hasn't
>>> resolved the issue?
>>>
>>> Thanks,
>>> Eric
>>>
>>> ------------------------
>>> Here're the diffs to get LSF support to compile. We also made a
>>> change
>>> so it would report the LSF failure code instead of an uninitialized
>>> variable when it fails:
>>>
>>> Index: pls_lsf_module.c
>>> ===================================================================
>>> --- pls_lsf_module.c (revision 17234)
>>> +++ pls_lsf_module.c (working copy)
>>> @@ -304,7 +304,7 @@
>>> */
>>> if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) {
>>> ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START);
>>> - opal_output(0, "lsb_launch failed: %d", rc);
>>> + opal_output(0, "lsb_launch failed: %d", lsberrno);
>>> rc = ORTE_ERR_FAILED_TO_START;
>>> goto cleanup;
>>> }
>>> @@ -356,7 +356,7 @@
>>>
>>> /* check for failed launch - if so, force terminate */
>>> if (failed_launch) {
>>> - if (ORTE_SUCCESS !=
>>> +/* if (ORTE_SUCCESS != */
>>> orte_pls_base_daemon_failed(jobid, false, -1, 0,
>>> ORTE_JOB_STATE_FAILED_TO_START);
>>> }
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>