Allocate a section of linear memory on the host which is page-locked and
directly accessible from the device. The storage is sufficient to hold the
given number of elements of a storable type. The runtime system automatically
accelerates calls to functions such as peekArrayAsync and pokeArrayAsync
that refer to page-locked memory.

Note that since the amount of pageable memory is thusly reduced, overall
system performance may suffer. This is best used sparingly to allocate
staging areas for data exchange

Device Allocation

Allocate a section of linear memory on the device, and return a reference to
it. The memory is sufficient to hold the given number of elements of storable
type. It is suitable aligned, and not cleared.

Execute a computation, passing a pointer to a temporarily allocated block of
memory sufficient to hold the given number of elements of storable type. The
memory is freed when the computation terminates (normally or via an
exception), so the pointer must not be used after this.

Note that kernel launches can be asynchronous, so you may need to add a
synchronisation point at the end of the computation.

Copy the given number of elements from the first device array (source) to the
second (destination). The copied areas may not overlap. This operation is
asynchronous with respect to host, but will not overlap other device
operations.

Copy the given number of elements from the first device array (source) to the
second (destination). The copied areas may not overlap. This operation is
asynchronous with respect to the host, and may be associated with a
particular stream.

Copy a 2D memory area from the first device array (source) to the second
(destination). The copied areas may not overlap. This operation is
asynchronous with respect to the host, but will not overlap other device
operations.

Copy a 2D memory area from the first device array (source) to the second
device array (destination). The copied areas may not overlay. This operation
is asynchronous with respect to the host, and may be associated with a
particular stream.

Write a list of storable elements into a newly allocated device array,
returning the device pointer together with the number of elements that were
written. Note that this requires two copy operations: firstly from a Haskell
list into a heap-allocated array, and from there into device memory. The
array should be freed when no longer required.

Temporarily store a list of elements into a newly allocated device array. An
IO action is applied to the array, the result of which is returned. Similar
to newListArray, this requires two marshalling operations of the data.

As with allocaArray, the memory is freed once the action completes, so you
should not return the pointer from the action, and be sure that any
asynchronous operations (such as kernel execution) have completed.