GASNet portals4-conduit documentation -*- text -*-
Original Author: Brian Barrett <bwbarre@sandia.gov>
Contact info: Portals Dev Team <ng-portals@sandia.gov>
=====================================================================
NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
=====================================================================
The portals4-conduit is a BETA conduit. It may be removed in a future release.
Users needing a high level of system stability are recommended to select other
conduits using the --enable-<conduit> options to the GASNet configure script.
=====================================================================
NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
=====================================================================
User Information:
-----------------
Portals 4 is the next generation network programming interface developed
at Sandia National Laboratories, building on the lessons learned from the
Portals 3.3 specification. Information on Portals, including a copy of
the specification, can be found at the Sandia Portals page:
http://www.cs.sandia.gov/Portals/
A reference implementation of Portals 4.0 which runs over InfiniBand and
UDP is available at:
https://github.com/Portals4/portals4
Where this conduit runs:
-----------------------
portals4-conduit is a portable conduit that should run anywhere with a
working implementation of the Portals 4 API. See the Portals documentation
for more information on current implementations.
Users with InfiniBand hardware are encouraged to use ibv-conduit instead,
which implements GASNet directly over IB verbs.
Users with Ethernet hardware are encouraged to use udp-conduit instead,
which implements GASNet directly over UDP.
Optional compile-time settings:
------------------------------
* All the compile-time settings from extended-ref (see the extended-ref README)
* Portals 4 implementations which do not support creating a memory
descriptor to bind all of memory may set two configure-time flags to
activate a workaround:
--with-portals4-max-md-size=SIZE
--with-portals4-max-va-size=SIZE
SIZE should be the Log2 of the size in bytes of the largest memory
space that can be covered by a memory descriptor and the largest
user-accessible virtual address. In general, these arguments should
not be needed.
Recognized environment variables:
---------------------------------
* All the standard GASNet environment variables (see top-level README)
* Conduit-specific environment variables:
GASNET_AM_BUFFER_SIZE=<size>
Size of each bounce buffer, in bytes, for receiving active messages.
Default is 1MB. This, combined with GASNET_AM_NUM_ENTRIES,
determines how much memory per process is set aside for receiving
active messages. Increasing this value may help performance if
frequent retransmissions are necessary, but it is likely that
increasing GASNET_AM_NUM_ENTRIES will be more fruitful.
Note that, unlike many interfaces, Portals 4 does not require a
buffer per active message. Instead, active messages are tightly
packed into a receive buffer until the buffer fills. So 16 entries
of 1MB means that the conduit can receive approximately 256k
messages of 16 arguments.
GASNET_AM_NUM_ENTRIES=<count>
Number of bounce buffers that should be used for receiving active
messages. Default is 16. Increasing this value may help
performance if frequent retransmissions are necessary. See
note in GASNET_AM_BUFFER_SIZE regarding receive buffer sizing.
GASNET_AM_EVENT_QUEUE_LENGTH=<count>
Number of event queue entries in the send and receive event queues
for active messages. Default is 8192 and the value should be a
multiple of two. The number of outstanding outgoing active messages
is limited by this queue length, with one entry required for every
short / medium active message and two entries required for every
long active message. The number of queued received active messages
is similarly limited.
GASNET_EXTENDED_EVENT_QUEUE_LENGTH=<count>
Number of event queue entries in the event queue used for extended
API communication. One entry is required for each outstanding put
or get operation at the initiator and no resources are required on
the target. The default is 8192.
* Portals 4 environment variables:
The Portals 4 implementation may provide environment variables to modify its
behavior. The reference implementation offers several settings that may be
relevant when using portals-conduit:
* PTL_IFACE_NAME allows for the explicit naming of the network interface to
be used by Portals, for example ib0, eno1, etc. This may be necessary on
systems utilizing non-standard interface naming.
* PTL_DISABLE_MEM_REG_CACHE=[0|1] activates/deactivates the IB memory
registration cache. Enabling allows Portals to amortize memory registration
overheads across multiple RMA operations and provides the best performance,
but requires the ummunotify Linux kernel module in order to run correctly.
Disabling it removes the requirement for ummunotify, and the implementation
does not keep a registered memory cache - this may lower performance but
helps ensure correctness.
* PTL_ENABLE_MEM=[0|1] will deactivate/activate local memory bypass transport at
the Portals level, if one is compiled in. This will path will only be
reached if the GASNet PSHM bypass has been disabled, and it requires the
KNEM kernel module.
* PTL_DEBUG=1 activates Portals-level debug tracing
* PTL_LOG_LEVEL=[0|1|2|3] will set the trace level.
See the Portals 4 documentation for the most up-to-date information.
Known problems:
---------------
* See the GASNet Bugzilla server for details on known bugs:
http://gasnet-bugs.lbl.gov/
Future work:
------------
* Optimization
* Use counting events to track implicit handle events
* Improved cleanup handling
* Native barrier support
==============================================================================
Design Overview:
----------------
Core API:
* Usage of Portals header fields...
header data: op count
match bits used on AM_PT
0 1 23 4567 0 1234567 0 1234567 01234567 01234567 01234567 01234567 01234567
| | | | | | payload length
^ ^ ^ ^ ^
| | | | +--- handler id
| | | +--- AM argument count
| | +--- Protocol: AM_SHORT, AM_MEDIUM, AM_LONG, or AM_LONG_PACKED
| +--- Request/Response: AM_REQUEST or AM_REPLY (0 if LONG_DATA)
+-- Type: ACTIVE_MESSAGE or LONG_DATA
* Short messages:
Type: ACTIVE_MESSAGE
Req/Rep: as specified
Protocol: AM_SHORT
AM argument count: as specied
handler id: as specified
payload length: 0
data:
<AM argument count * 4 bytes of arguments>
* Medium messages:
Type: ACTIVE_MESSAGE
Req/Rep: as specified
Protocol: AM_MEDIUM
AM argument count: as specied
handler id: as specified
payload length: as specified (nbytes)
data:
<AM argument count * 4 bytes of arguments>
<nbytes of payload>
* Long Packable messages (nbytes < 2048 - 8)
Type: ACTIVE_MESSAGE
Req/Rep: as specified
Protocol: AM_LONG_PACKED
AM argument count: as specied
handler id: as specified
payload length: as specified (nbytes)
data:
<AM argument count * 4 bytes of arguments>
<uint64_t remote address>
<nbytes of payload>
* Long Messages
Type: ACTIVE_MESSAGE
Req/Rep: as specified
Protocol: AM_LONG
AM argument count: as specied
handler id: as specified
payload length: 0;
data:
<AM argument count * 4 bytes of arguments>
Type: LONG_DATA
Req/Rep: as specified
Protocol: AM_LONG
AM argument count: 0
handler id: 0
payload length: 0
data:
<nbytes of data delivered direcly to target address>
Note that the start + offset in the LONG_DATA gives you the remote
address for the active message callback.
Source side message Notes:
The extended API messages are not associated with a source-side
fragment. Therefore, we use the top 4 bits of the virtual address
to record the type of operation (active message, long data for
active message, explicit handle extended, or implicit handle
extended). The remainder of the user_ptr is the actual virtual
address for the fragment (active message) or operation handle.
Extended API:
The extended API is implemented using list entries instead of the
match list entries used by the core API, for higher message rate.
The target side generates no events unless an error has occurred.
The source side is rate limited by the size of the event queue,
although the default event queue should be large enough to cover the
maximum number of operations that can be in flight for current
hardware.
The Portals 4 conduit was designed with an end-to-end reliability
protocol in mind. In these implementations, the SEND event will
arrive only shortly (probably simultaneously from the view of the
process) before the ACK event. Therefore, we wait for the ACK event
rather than the SEND event to wait for local completion. Modifying
the EOP case to support releasing local-completion blocking calls at
the SEND event would be straight-forward. The IOP case would be
slightly harder, since the IOP associated with an event is also
associated with all other operations launched in that thread. One
solution would be to use counting events for remote completion and
then the SEND event could have a pointer to a completion word.