The VECTORISE pragma

The vectoriser needs to know about all types and functions whose vectorised variants are directly implemented in the DPH library (instead of generated by the vectoriser), and it needs to know what the vectorised versions are. That is the purpose of the VECTORISE pragma (which comes in in number of flavours).

Scalar versus parallel types and values

In addition to tracking the vectorised versions of types and values, the vectoriser needs to keep track of whether the computation of values and functions involves data parallelism, and also, whether types embed parallel arrays. Whether or not a type or value is associated with a vectorised version is not sufficient to decide on the presence of embedded parallelism. In particular, every higher-order function must be vectorised as its execution may involve parallel computations if any of its functional arguments involve parallel computation. Nevertheless, the higher-order function itself may be purely scalar. An example is function application itself:

($) :: (a -> b) -> a -> b
f $ a = f a

It clearly, does not directly include any data parallelism, but mapP f $ arr invariably does.

The basic VECTORISE pragma for values

Given a function f, the vectoriser generates a vectorised version f_v, which comprises the original, scalar version of the function and a second version lifted into array space. The lifted version operates on arrays of inputs and produces arrays of results in one parallel computation. The original function name is, then, rebound to use the scalar version referred to by f_v. This differs from the original in that it uses vectorised versions for any embedded parallel array computations.

We have got two exceptions to this rule. Firstly, if the body of a function f is scalar —i.e., it does not involve any parallel array computations and has scalar argument and result types— then we leave it as is and omit the generation of f_v. Whether a function is scalar is determined by the rules described in the Vectorisation Avoidance paper.

Secondly, if a variable f is accompanied by a pragma of the form

{-# VECTORISE f = e #-}

then the vectoriser defines f_v = e and refrains from rebinding f. This implies that for f :: t, e's type is the t vectorised (in particular), e's type uses the array closure type (:->) instead of the vanilla function space (->). The vectoriser checks that e has the appropriate type.

This pragma can also be used for imported functions f. In this case, f_v and a suitable vectorisation mapping of f to f_v is exported implicitly — just like RULES applied to imported identifiers. By vectorising imported functions, we can vectorise functions of modules that have not been compiled with -fvectorise. This is crucial to using the standard Prelude in vectorised code.

Parallelism: A vectorised value is marked as parallel if its code includes a parallel value or if it includes any parallel types. The detailed rules are in the Vectorisation Avoidance paper.

IMPLEMENTATION RESTRICTION: Currently the right-hand side of the equation —i.e., e— may only be a simple identifier and it must be at the correct type instance. More precisely, the Core type of the right-hand side must be identical to the vectorised version of t.

The NOVECTORISE pragma for values

If a variable f is accompanied by a pragma

{-# NOVECTORISE f #-}

then it is ignored by the vectoriser — i.e., no function f_v is generated and f is left untouched.

This pragma can only be used for bindings in the current module (exactly like an INLINE pragma). The pragma must be used on all bindings forming a recursive group if it is used on any of the bindings in a group.

Parallelism:f will not be marked as parallel.

Caveat: If f's definition contains bindings that are being floated to the toplevel, those bindings may still be vectorised. (TODO We might want to ensure that we never float anything out of (at least, those) bindings before the vectoriser is invoked.)

The VECTORISE SCALAR pragma for functions

Removed.

The basic VECTORISE pragma for type constructors

Without right-hand side

For a type constructor T, the pragma

{-# VECTORISE type T #-}

indicates that the type T should be automatically vectorised even if it is imported. This is the default for all data types declared in the current module. If the type embeds no parallel arrays, no special vectorised representation will be generated.

The type constructor T must be in scope, but it may be imported. PData and PRepr instances are automatically generated by the vectoriser.

Examples are the vectorisation of types, such as Maybe and [], defined in the Prelude.

Parallelism:T is being marked as parallel by the vectoriser if T's definition includes any type constructor that is parallel.

With right-hand side

For a type constructor T, the pragma

{-# VECTORISE type T = T' #-}

directs the vectoriser to replace T by T' in vectorised code. Vectorisation of T is abstract in that constructors of T may not occur in vectorised code.

The type constructor T must be in scope, but it may be imported. PData and PRepr instances must be explicitly defined — they are not automatically generated.

An example is the vectorisation of parallel arrays, where [::] is replaced by PArray during vectorisation, but the vectoriser never looks at the representation of [::].

Parallelism: The type constructor T is marked as parallel.

The VECTORISE SCALAR pragma for type constructors

All types imported from modules that have not been vectorised are regarded to be scalar types, and they can be used in encapsulated scalar code. If custom instances for the PData and PRepr classes are provided, these types can also be used in vectorised code. The latter types have no specialised vectorised representation (as they are scalar) and are abstract; i.e., their constructors cannot be used in vectorised code. An example is the treatment of Int. Ints can be used in vectorised code and remain unchanged by vectorisation. However, the representation of Int by the I# data constructor wrapping an Int# is not exposed in vectorised code. Instead, computations involving the representation need to be confined to scalar code.

Without right-hand side

For a type constructor T, the pragma

{-# VECTORISE SCALAR type T #-}

indicates that the type T is scalar; i.e., it cannot have any embedded arrays. Hence, the type T represents itself in vectorised code. (No special vectorised version needs to be generated.)

The type constructor T must be in scope, but it may be imported. PData and PRepr instances for T need to be manually defined if needed at all. (This is the fundamental difference to types for which the vectoriser determines automatically that they don't need a vectorised version: for the latter, the vectoriser automatically generates instances for PData and PRepr.)

An example is the handling of Bool, which is scalar and represents itself in vectorised code, but we want to use the custom instances of 'PData' and 'PRepr' defined in the DPH libraries.

Parallelism: The type T is not marked as parallel.

With right-hand side

For a type constructor T, the pragma

{-# VECTORISE SCALAR type T = T' #-}

directs the vectoriser to replace T by T' in vectorised code, but the type is abstract — i.e., its constructors cannot be used in vectorised code. Although, the representation of the type changes during vectorisation, it is still regarded as scalar, and hence, can be used in encapsulated scalar code.

The type constructor T must be in scope, but it may be imported. The PData and PRepr instances for T need to be manually defined.

An example is the handling of (->), which the vectoriser maps to (:->), but it never looks at the implementation of (->) and allows its use in encapsulated scalar code.

Parallelism: The type T is not marked as parallel.

The NOVECTORISE pragma for types

If a type constructor T is accompanied by a pragma

{-# NOVECTORISE type T #-}

then it is ignored by the vectoriser — i.e., no type T_v and no class instances are generated.

This pragma can only be used for definitions in the current module.

Parallelism: The type T is not marked as parallel.

TODO

Not implemented yet.

The VECTORISE pragma for type classes

For a type class C, the pragma

{-# VECTORISE class C #-}

indicates that the class C should be automatically vectorised, even if it is imported. This is the default for all classes declared in the current module.

The class C must be in scope, but it may be imported. 'PData' and 'PRepr' instances are generally not used for type classes and their dictionary representations. This pragma is only needed for classes that are declared in non-vectorised modules and if we want to declare class instances in vectorised code.

An example is the handling of Eq.

Parallelism: The class tycon of C is marked as parallel if the class methods include any type constructors marked as parallel.

TODO

We want something like {-# VECTORISE class C = C' #-} (but what about the instances?) Do we still have a need for that???

The VECTORISE SCALAR pragma for class instances

For a class instance C t, the pragma

{-# VECTORISE SCALAR instance C t #-}

indicates that the instance dfun C t should vectorised by proceeding as for VECTORISE SCALAR on each individual class method of C.

C t must exactly match a single instance declaration provided for C.

An example is {-# VECTORISE SCALAR instance Num Int #-}.

RESTRICTION: Instance dictionaries vectorised with this pragma can not be mutually recursive. (They may be recursive.)

TODO

This pragma should now be redundant, as uses of unvectorised class instances should just be encapsulated. (Do we still need the VECTRORISE class pragma? Probably only if we want to allow custom instances in vectorised modules.)

Vectorising imported definitions

The various VECTORISE pragmas can be applied to imported identifiers (both variables and types). The resulting vectorisation mappings and the vectorised version of the identifier will be implicitly exported — much like it is the case for RULES defined on imported identifiers.