It has been on my to-do list for a while to start a FAQ listing of the
various resilience/FT related activities in and around Open MPI. This would
provide a starting location for users and new developers could go to for an
overview of each of the features, and how to activate/use the feature.

I'll try to bump that up the priority list and post a message once it is
ready. Probably a month or so off since I need to collect some information
from various developers.

> I think we're some ways away from declaring a "resilient ORTE". Josh and I
> have been committing pieces of it over the last two years, and Wes just
> committed another piece the other day that might have been titled "fault
> tolerant OOB" as it primarily addressed maintaining comm routing during node
> failures.
>
> Setting aside the obvious MPI issues, there are several
> branches/organizations working different aspects of the ORTE problem,
> including:
>
> * fault prediction and proactive migration
>
> * mapping algorithms to minimize failure cascades
>
> * simultaneous failure handling
>
> * alternative wiring methods that eliminate the OOB routing issues
>
> etc. We expect most of those developments to arrive over the next 6-12
> months. Once that has occurred, we'll probably be close to what we would
> call a "resilient" system.
>
> Until then, we are improving, but still far from "resilient".
>
>
> On Jun 24, 2011, at 10:24 AM, Ken Lloyd wrote:
>
> Josh and Wesley,
>
> Will you be presenting Resilient ORTE at Resilience 2011 in Bordeaux?
>
> http://xcr.cenit.latech.edu/resilience2011/>
> =====================
> *Kenneth A. Lloyd*
> CEO - Director of Systems Science
> Watt Systems Technologies Inc.
> www.wattsys.com
> kenneth.lloyd_at_[hidden]
>
> This e-mail is covered by the Electronic Communications Privacy Act, 18
> U.S.C. 2510-2521 and is intended only for the addressee named above. It may
> contain privileged or confidential information. If you are not the addressee
> you must not copy, distribute, disclose or use any of the information in it.
> If you have received it in error please delete it and immediately notify the
> sender.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel>