To: J3 12-136r1
From: Bill Long
Subject: Coarray TS features
Date: 2012 February 16
References: 09-184, 11-256r2, N1858, N1906
Summary:
=======
Proposal 11 in 11-256r2 concerns the overall scope of the proposed
Technical Specification for Fortran Coarray extensions. The consensus
of J3 is that the scope should be limited. This recommendation is in
keeping with the general purpose of a TS, which is to address
circumstances "when there is an urgent market requirement for such
documents".
Of the remaining 10 Proposals in 11-256r2, these features, as
described in this paper, are recommended by J3 for inclusion in the TS
in this order of priority, assuming no serious technical flaw is
found:
1) Teams
2) Collectives
3) Atomic Operations
4) Synchronization using events
5) Parallel I/O
These features from 11-256r1 are not recommended for inclusion in the
TS.
6) Remove restrictions on coarray components
7) THIS_IMAGE(X) be scalar if X has corank 1
8) Global pointers / Copointers
9) Asymmetric allocatable and pointer objects
10) Predicated copy_async intrinsic
Beyond the Proposals in 11-256r1, no other features are recommended.
Feature Descriptions:
=====================
Feature 1: Teams
----------------
Teams provide a capability to restrict the image set of remote memory
references, coarray allocations, and synchronizations to a subset of
all the images of the program. This simplifies code for some
applications that involve segregated activities (parts of a climate
model, for example). Teams also provide a mechanism for limiting
activity to a subset of the computer system that might result in
better performance within the team (such as within a local SMP
domain.)
The simplest form of a team feature, as described in N1858, is not
adequate. A richer form of teams is proposed with these
characteristics:
- A team of all images exists at the beginning of program execution.
- An image is always a member of some team, and a member of only one
team at a time.
- Team variables, of type TEAM_TYPE (defined in ISO_FORTRAN_ENV), are
used to identify new teams. The type has one public component with
its value equal to the number of images in the team.
- New teams can be formed with a new statement (SPLIT TEAM, or FORM
TEAM) that defines the specified teams. The aggregate number of
images in the teams shall equal the number of images in the current
team. The new teams are composed of images with consecutive image
numbers in the current team. A team variable cannot be defined other
than by execution of the statement used to form teams.
- A construct is provided to specify a new current team for the
executing image. Possibilities are a WITH TEAM ... END WITH TEAM
construct, or a SELECT TEAM ... END SELECT construct. Execution
within the construct is on the context of the specified team. Image
numbers are relative to the team, starting at 1 and ending with the
number of images in the team. Collective activities, such as SYNC
ALL, allocation and deallocation of coarrays, collective subroutine
execution, and inquiry intrinsics such as THIS_IMAGE and NUM_IMAGES
are relative to the team. When execution of the construct
completes, the current team reverts to its previous value.
- Access to variables on images outside the current team is not
permitted.
The feature will require significant revision of the text in the
standard, but the capabilities are sufficiently useful to justify the
effort.
Feature 2: Collectives
----------------------
Collective subroutines offer the possibility of substantially more
efficient execution of reduction operations that would be possible by
non-expert programmers. Corresponding routines are widely used in MPI
programs. This feature has been the most widely requested in the
discussion of coarrays.
Intrinsic Collective subroutines:
CO_SUM
CO_MAX
CO_MIN
CO_BROADCAST
CO_REDUCE
See 11-193 for detailed descriptions of these subroutines. If Feature
1 (Teams) is accepted, the collective subroutines would not have a
team argument. Instead, the collective operation is done by the
members of the current team. If there is no form of teams
implemented, a different method to specify a subset of the images is
needed. An array specifying the list of images, similar to the array
used for specifying an image list for SYNC IMAGES, should be
considered.
Feature 3: Atomic operations
-----------------------------
Intrinsic subroutines implementing atomic memory operations provide
the capability to write simple and efficient code for several common
operations used in parallel programs. These capabilities are provided
in other parallel programming models and have been available as
extensions in some Fortran implementations (though with different
syntax). A standard specification would improve the prospects for code
portability.
All of the proposed subroutines are intrinsic atomic subroutines,
expanding the Fortran 2008 list of ATOMIC_DEFINE and ATOMIC_REF.
Paper 09-184 provides a detailed description of ATOMIC_CAS. The other
routines have similar semantics. The ATOM argument is atomically
modified, based on the other arguments. If an optional OLD argument is
present, it is assigned the value of ATOM immediately before the
specified operation is performed.
New Intrinsic Atomic subroutines:
ATOMIC_CAS (ATOM, OLD, COMPARE, NEW) ! Compare-and-swap
ATOMIC_ADD (ATOM, VALUE [,OLD]) ! Atomic integer add
ATOMIC_AND (ATOM, VALUE [,OLD]) ! Atomic bitwise AND
ATOMIC_OR (ATOM, VALUE [,OLD]) ! Atomic bitwise OR
ATOMIC_XOR (ATOM, VALUE [,OLD]) ! Atomic bitwise exclusive OR
ATOMIC_SWAP (ATOM, VALUE) ! Atomic swap
ATOMIC_AX (ATOM, MASK, VALUE [,OLD])! Atomic bitwise and-xor
The atomic and-xor operation replaces ATOM with
ieor(iand(ATOM,MASK),VALUE). The atomic and-xor operation also
provides the capability to perform atomic definitions of a subset of
the bits of ATOM.
Integrating this feature into the Fortran standard should be
straightforward. New intrinsic procedures are added. Implications for
the memory model semantics are already specified for execution of an
atomic subroutine.
Feature 4: Synchronization using events
---------------------------------------
Events provide a capability for an image to signal (an)other image(s)
which can then detect that the event has been posted. This replaces the
NOTIFY/QUERY feature of N1858. It is superior to the old feature
because the events are associated with user variable, so multiple events
can exist for an image, and a library routine could establish events
internally that will not interfere with notifications that might be
occurring on an image separate from the library code.
Terminology:
An "event" is an abstraction of an issue of common interest to two or
more images that is supposed to "occur" at a moment the "interested"
images are "active".
An image can "post" notification of the occurrence of an event to
other image(s).
An image can "query" for notification of the occurrence of an event, or
it can "wait" for the notification of its occurrence.
Functionality offered:
1. An image can notify other image(s) of the occurrence of an event.
2. An image can query the status of the event or wait for it to
be notified.
3. Allow multiple images to wait/query for the notification of a single event.
On waiting, one of them will quit waiting when notified.
Syntax:
EVENT POST (EVENT) ! Post notification of the occurrence of EVENT
EVENT QUERY (EVENT) ! Query whether EVENT has occurred or not
EVENT WAIT (EVENT) ! Wait for notification on EVENT having occurred
Semantics:
EVENT is a data item accessible on all images interested in being
notified of the occurrence of the associated event.
EVENT is established as being a data item describing an event by the
execution of EVENT POST on the image that wants to signal the
occurrence of EVENT and by EVENT QUERY or EVENT WAIT on the image(s)
that want to query or wait for EVENT to have occurred.
Challenges/Questions:
1. It is an open question whether it is necessary to have a separate type
for EVENTs; A scalar coarray of type integer might be sufficient.
2. Another open question is how we specify that EVENT should start being
"cleared", i.e., that the associated event hasn't occurred yet.
1. and 2. are decided/solved by defining a derived type for EVENT
with a default initializer that sets it "cleared".
3. The following has to be translated to standardeze to prevent illogical
programs from being valid:
The dynamical scope of EVENT must be such that it "exists" before,
during and after all NOTIFY's and WAIT's/QUERY's on it are/have been
processed.
4. The following syntax is proposed for (non-)local access to EVENT:
POST(EVENT ) Notification on the local EVENT variable.
POST(EVENT[n]) Notification on image n in the current TEAM.
POST(EVENT[*]) Notification on all images in the current TEAM.
QUERY/WAIT(EVENT) will likely refer to the local variable, as that
is more efficient.
Feature 5: Parallel I/O
-----------------------
The purpose of the proposed parallel I/O feature is to allow
multiple images to access the same file.
This proposal extends the OPEN statement to allow a file to be
connected to a collection of images. If teams are available,
a TEAM= specifier that takes the values YES or NO is added to
the OPEN statement. If the TEAM= specifier appears and its
value is YES, the current team becomes the connect team for the
unit; otherwise, the file is connected for the current image
only. If the teams proposal is not adopted, a new specifier,
possibly IMAGES=, is added to specify the connect team for the
unit.
The unit number specified in the OPEN statement references the
same file on all images in the connect team. All images in a
connect team shall execute an OPEN statement with the same
connect-specs except for the ERR=, IOMSG=, IOSTAT=, and
NEWUNIT= specifiers. The OPEN statement acts as an implicit
image synchronization for the images in the connect team.
Files connected for parallel I/O shall be opened for direct-
access only.
If an image executes a CLOSE statement on a unit, all images in
the unit's connect team shall close the unit with the same file
disposition. The CLOSE statement performs a implicit image
synchronization for the images in the connect team.
The FLUSH statement shall make data written to an external unit
available to all images in the unit's connect team which execute
a FLUSH statement for that unit in a subsequent segment.
If teams are available, a TEAM= specifier will be added to the
INQUIRE statement. If the teams proposal is not adopted, a new
specifier, possibly IMAGES=, will be added instead.
The NEXT_REC= specifier in an INQUIRE statement executed in an
image will be assigned the value n + 1, where n is the record
number of the last record read or written by the image.
J3 recommends against allowing parallel I/O for access methods
other than direct-access.
Feature 6: Removing restrictions on coarray components
------------------------------------------------------
The current restrictions on coarray components result in a poor
integration of coarrays with the object oriented programming features
of Fortran.
The restriction in C432 that a coarray component cannot be added to a
type through type extension unless the parent type has a coarray
component limits the use of coarray components. It would prohibit
using a base type that had no components from ever having a coarray
component added, for example.
The restriction in C444 that the parent of a coarray component cannot
be a coarray limited the declaration of some variables as a coarray.
However, there were reasons for these restrictions. For example, in
the code fragment
! The definition of type t does not include a coarray component
class(t),allocatable :: x
class(t) :: y
x = y
! which is equivalent to
! deallocate(x)
! allocate(x, source=y)
may or may not involved synchronization depending on whether the
dynamic type of t is one that has a coarray. With that possibility a
code section would need to be called on all images of the current team
to avoid deadlock in the case of a coarray component. If the user
needs to add coarray components, the original parent type could
include an allocatable coarray scalar, avoiding the problems here.
The possible problems with the feature outweigh the possible benefits.
Feature 7: THIS_IMAGE(X) be scalar if X has corank 1
----------------------------------------------------
The function this_image should allow a scalar return value for coarray
arguments with just one codimension as:
integer :: me
real :: x[*]
me = this_image(x)
As concluded in 11-251r1, this is not accepted since it would be
inconsistent with the characteristics of similar intrinsics. The
functionality is available as
this_image(x, 1)
Feature 8: Global pointers / Copointers
---------------------------------------
Some algorithms involve large data structures, such as graphs or
linked lists, that span many images. These codes would benefit from a
pointer-like object for which a remote target is allowed.
Copointers provide a mechanism for associating a pointer with a target
on a different image. Association is allowed with a local target or a
coindexed target. Like ordinary pointers, copointer assignment to
another copointer results in association with the target of the other
copointer. A method is provided to determine the image number of the
target of a copointer. Copointer references include an empty [] to
signal a potentially remote reference. Copointers are allowed as
components. If the parent object is a coarray, it is possible to
associate a remote copointer with a target.
The interaction of copointers with the existing memory model is
nontrivial. In the context of teams the status of copointers would be
difficult to determine if the target was on an image that is no longer
part of the current team. Significant revision of the standard would
be needed to accommodate what is fundamentally a new data concept. The
benefit from this feature it too high to justify for a TS.
Feature 9: Asymmetric allocatable and pointer objects
-----------------------------------------------------
If the teams feature is adopted, this feature is less important since
allocation is done only on images of the current team. It is also
possible to have different size allocations on different images by
employing components of coarray structures.
The potential usefulness of this feature so not rise to the amount of
implementation work required.
Feature 10: Predicated copy_async intrinsic
-------------------------------------------
This feature is inconsistent with the basic design of coarrays that
provides for definition and reference of variables on different images
using simple syntax. In many of the cases where this might be used, a
compiler could perform comparable optimization anyway.