Intel ArBB Segregated Storage and Data Copies

SummaryThis article will answer the question when and how many copies occur for the inputs and outputs of an Intel ArBB function. This article is based on a webinar, the material of the webinar can be found here.

Intel ArBB FunctionsFirst of all, we need to have a closer look at the term "Intel ArBB function".

An Intel ArBB function is a C/C++ function as shown above which can be called using the Intel ArBB call-operator.

Intel ArBB Call-OperatorThe figure below is illustrating the Just-In-Time (JIT) compilation process as invoked by the call-operator.The vertical line is similar to the time axis first showing the "as-if" execution, followed by the generation of an Intermediate Representation (IR) of the code. On the right hand side of the time axis, the execution is ready to proceed with the supplied actual arguments. The variable named "my_function" (green) is a pointer to a function, it is only called by the call-operator when it is seen the first time to actually collect the IR of an Intel ArBB function. The function pointer is also a key value used for the caching of the generated code. Passing the same function pointer again to the call-operator will immediately proceed with the native execution of the JIT-generated code.

Please note, when turning the Intel ArBB runtime into the emulation mode (ARBB_EMULATE=1, or ARBB_OPT_LEVEL=O0), the call-operator looks the same except that the "as-if" execution will actually execute the effect of the Intel ArBB function (no JIT-compilation, serial execution).

Bind InterfaceMaking use of one of the overloaded Intel ArBB bind functions is subject of the "bind interface". The bind interface can be used to bind a buffer to a default-constructed dense container. Passing such a container into an Intel ArBB function is subject of data copies into the segregated Intel ArBB data storage space. A distinction can be made between "copy-in" and "copy-out" with respect to the inputs and outputs of an Intel ArBB function. Please note, an in-out argument is subject of two copies in cases where the argument is actually used as an input of an Intel ArBB function. Using an in-out argument only on the left hand side (lhs, or destination) can prevent the "copy-in".

A "copy-in" occurs every time on each input (including an in-out which is used as an input) when calling an Intel ArBB function. A "copy-out" occurs every time on each output (including an in-out used as an output) when calling an Intel ArBB function. Calling an ArBB function is meant to happen via the call-operator (not in emulation mode), or meant to be directly using a closure. (Note, that unused arguments do not fall under these rules. Further, this only applies to the native execution and not the the emulation mode of Intel ArBB)

Copies are omitted when all of the following conditions are met:

Sufficient aligmnent according to the Intel ArBB malloc functions

Default pattern specified when calling bind (no stride/pitch)

Non-remote execution

To summarize, whether copies are made or not is depending on:

Signature of the bind function, i.e. when using user-defined types (UDTs), or explicitly using a non-default pitch (stride)

Where the execution happens, i.e. on the host-side, or as a remote execution (e.g. in case of executions targeting Intel® MIC architecture)

Sufficient buffer alignment

One can explicitly avoid these copies by using the range interface which consists of the following dense container methods:

write_only_range()

read_only_range()

read_write_range()

To write a correct Intel ArBB program, these methods must be used in order to make sure bound buffers are updated accordingly. This mechanism is helping to avoid automatic "copy-in" and automatic "copy-out" per use of the call-operator. This also applies to successive calls when proceeding with a dense container argument previously involved into an Intel ArBB call. Intel ArBB might enforce the use of the above three methods in the future such that one cannot rely on an automatically updated buffer when returning from the execution of the Intel ArBB call-operator.

Memory Mapping or Range InterfaceDense containers with an initial size given at construction time are subject of being used together with the memory mapping interface. Since there are no host-side buffers explicitly bound to such dense containers, there is no need to keep such buffers updated automatically. Dense containers constructed with an initial size are correctly aligned in memory. The memory mapping interface (which is also called the "range interface") consists of the same three methods (write_only_range, read_only_range, read_write_range) which are already described in the above section (bind interface section).