Trapeze messages are short (128-byte) control messages with
optional attached payloads typically containing application data not
interpreted by the messaging system, e.g., file blocks, virtual memory pages,
or TCP segments. The data structures in NIC memory include two message
rings, one for sending and one for receiving. Each message ring is a circular
producer/consumer array of 128-byte control message buffers and related state,
shown in Figure 1.
The host attaches a payload buffer
to a message by placing its DMA address in a designated field of the
control message header.

The Trapeze messaging system
has several features useful for high-speed network storage access:

Separation of header and payload. A Trapeze
control message and its payload (if any) are sent as a single packet
on the network, but they are handled separately by
the message system, and the separation is preserved at the
receiver. This enables the TCP/IP socket layer and NetRPC to
avoid copying, e.g., by remapping aligned payload buffers. To simplify zero-copy block fetches, the NIC
can demultiplex incoming payloads into a specific frame, based on a token
in the message that indirects through
an incoming payload table on the NIC.

The NetRPC package based on Trapeze is derived from the original RPC
package for the Global Memory Service (gms_net), which was
extended to use Trapeze with zero-copy block handling and support
for asynchronous
prefetching at high bandwidth [1].

To complement the zero-copy features of Trapeze, the
socket layer, TCP/IP driver, and NetRPC share a common pool of aligned network
payload buffers allocated from the virtual memory page frame pool.
Since FreeBSD exchanges file block buffers between the virtual memory
page pool and the file cache, this allows
unified buffering among the
network, file, and VM systems. For example, NetRPC can send any virtual
memory page or cached file block out to the network by attaching it as
a payload to an outgoing message. Similarly, every incoming payload
is deposited in an aligned physical frame that can mapped into a user
process or hashed into the file cache or VM page cache.
This unified buffering also enables the socket layer to reduce
copying by remapping pages, which significantly reduces overheads
for TCP streams [7].

High-bandwidth network I/O requires support for asynchronous block
operations for prefetching or write-behind. NFS clients
typically support this asynchrony by handing off outgoing
RPC calls to a system I/O daemon that can wait for RPC replies,
allowing the user process that originated the request to
continue. NetRPC supports a lower-overhead alternative
using nonblocking RPC, in which the calling thread or process
supplies a
continuation procedure to be executed -- typically from the receiver
interrupt handler -- when the reply arrives. The
issuing thread may block at a later time, e.g.,
if it references a page that is marked in the I/O
cache for a pending prefetch. In this case, the thread sleeps and is
awakened directly from the receiver interrupt handler.
Nonblocking RPCs are a simple extension
of kernel facilities already in place for asynchronous I/O on disks;
each
network I/O operation applies to a buffer in the I/O cache,
which acts as a convenient point for synchronizing with the operation
or retrieving its status.