13-276
To: J3
From: Nick Maclaren/Steve Lionel
Subject: Changes for failed image wording
Date: 2013 June 25
The following changes are intended to make it clear that the failed
image facility provides the processor with way to indicate failure
that can be tested in a portable program, but does not require a
processor to support failure recovery or specify exactly what the
consequences are following failure.
This paper also reflects a change to make
STAT_FAILED_IMAGE the lowest priority failure status rather than the
highest priority, and an editorial change from Malcolm.
Edits to N1967
--------------
[Introduction:iv:p2] After sentence 2 add "The existing system does
not provide a mechanism for a processor to identify what images have
failed during execution of a program. This adversely affects the
resilience of programs executing on large systems."
[Introduction:iv:p3] After the first comma in sentence 1, add "for the
processor to indicate which images have failed during execution and
allow continued execution of the program on the remaining images, "
[5.6 STAT FAILED IMAGE:p11] This should be replaced by:
"The value of the default integer scalar constant STAT_FAILED_IMAGE is
different from the value of STAT_STOPPED_IMAGE, STAT_LOCKED,
STAT_LOCKED_OTHER_IMAGE, or STAT_UNLOCKED. If the processor has the
ability to detect that an image of the current team has failed, and
does so, its value is assigned to the variable specified in a
STAT=specifier in an execution of an image control statement, or the
STAT argument in an invocation of a collective procedure. A failed image
is one for which references or definitions of variables fail when that
variable should be accessible, or the image fails to respond as part of
a collective activity. A failed image remains failed for the remainder
of the program execution. If more than one nonzero status value is valid
for the execution of a statement, the status variable is defined with a
value other than STAT_FAILED_IMAGE. The conditions that cause an image to
fail are processor dependent.
NOTE 5.4 A failed image is usually associated with a hardware failure of
the processor, memory system, or interconnection network. A failure
that occurs while a coindexed reference or definition, or collective
action, is in progress may leave variables on other images that would be
defined by that action in an undefined state. Similarly, failure while
using a file may leave that file in an undefined state. A failure on
one image may cause other images to fail for that reason."
[7.2;p15:22-29] Replace paragraph with:
If the STAT argument is present in an invocation of a collective subroutine
and its execution is not successful, the argument may become defined with a
nonzero value and the effect is otherwise the same as that of executing the
SYNC_MEMORY statement. If execution involves synchronization with an image
that has stopped, the argument becomes defined with the value of
STAT_STOPPED_IMAGE in the intrinsic module ISO_FORTRAN_ENV (13.8.2);
otherwise, if no image of the current team has stopped or failed, the argument
becomes defined with a processor-dependent positive value that is different
from the value of STAT_STOPPED_IMAGE or STAT_FAILED_IMAGE in the intrinsic
module ISO_FORTRAN_ENV (13.8.2). If an image had failed, but no other error
condition occurred, the argument may become defined with the value of the
constant STAT_FAILED_IMAGE.
[8.5 Edits to clause 8;p27:30] "becomes" should be changed to "may
become".
[8.6 Edits to clause 13;p28:22-29] replace paragraph with the following
(includes revisions from success4.txt):
If the STAT argument is present in an invocation of a collective subroutine
and its execution is not successful, the argument may become defined with a
nonzero value and the effect is otherwise the same as that of executing the
SYNC_MEMORY statement. If execution involves synchronization with an image
that has stopped, the argument becomes defined with the value of
STAT_STOPPED_IMAGE in the intrinsic module ISO_FORTRAN_ENV (13.8.2);
otherwise, if no image of the current team has stopped or failed, the argument
becomes defined with a processor-dependent positive value that is different
from the value of STAT_STOPPED_IMAGE or STAT_FAILED_IMAGE in the intrinsic
module ISO_FORTRAN_ENV (13.8.2). If an image had failed, but no other error
condition occurred, the argument may become defined with the value of the
constant STAT_FAILED_IMAGE.