General

Scope

This Technical Specification describes requirements for implementations of an
interface that computer programs written in the C++ programming language may
use to invoke algorithms with parallel execution. The algorithms described by
this Technical Specification are realizable across a broad class of
computer architectures.

This Technical Specification is non-normative. Some of the functionality
described by this Technical Specification may be considered for standardization
in a future version of C++, but it is not currently part of any C++ standard.
Some of the functionality in this Technical Specification may never be
standardized, and other functionality may be standardized in a substantially
changed form.

The goal of this Technical Specification is to build widespread existing
practice for parallelism in the C++ standard algorithms library. It gives
advice on extensions to those vendors who wish to provide them.

1.2

Normative references

The following referenced document is indispensable for the
application of this document. For dated references, only the
edition cited applies. For undated references, the latest edition
of the referenced document (including any amendments) applies.

ISO/IEC 14882:— is herein called the C++ Standard.
The library described in ISO/IEC 14882:— clauses 17-30 is herein called
the C++ Standard Library. The C++ Standard Library components described in
ISO/IEC 14882:— clauses 25 and, 26.7 and 20.7.2 are herein called the C++ Standard
Algorithms Library.

Unless otherwise specified, the whole of the C++ Standard's Library
introduction (C++14 §17) is included into this
Technical Specification by reference.

1.3

Namespaces and headers

Since the the extensions described in this Technical Specification are
experimental and not part of the C++ Standard Library, they should not be
declared directly within namespace std. Unless otherwise specified, all
components described in this Technical Specification are declared in namespace
std::experimental::parallel::v1.

[ Note:
Once standardized, the components described by this Technical Specification are expected to be promoted to namespace std.
— end note ]

Unless otherwise specified, references to such entities described in this
Technical Specification are assumed to be qualified with
std::experimental::parallel::v1, and references to entities described in the C++
Standard Library are assumed to be qualified with std::.

Extensions that are expected to eventually be added to an existing header
<meow> are provided inside the <experimental/meow> header,
which shall include the standard contents of <meow> as if by

Execution policies

In general

This clause describes classes that represent execution policies. An
execution policy is an object that expresses the requirements on the ordering
of functions invoked as a consequence of the invocation of a standard
algorithm. Execution policies afford standard algorithms the discretion to
execute in parallel.
This clause describes classes that are execution policy types. An object
of an execution policy type indicates to an algorithm whether it is allowed to execute
in parallel and expresses the requirements on the element access functions.

[ Note:
Because different parallel architectures may require idiosyncratic
parameters for efficient execution, implementations of the Standard Library
shouldmay provide additional execution policies to those described in this
Technical Specification as extensions.
— end note ]2.2

Execution policy type trait

is_execution_policy
can be used to detect parallel execution policies for the purpose of
excluding function signatures from otherwise ambiguous overload
resolution participation.

If T is the type of a standard or implementation-defined execution policy, is_execution_policy<T> shall be publicly derived from integral_constant<bool,true>,
otherwise from integral_constant<bool,false>.

is_execution_policy<T> shall be a UnaryTypeTrait with a BaseCharacteristic of true_type if T is the type of a standard or implementation-defined execution policy, otherwise false_type.

The behavior of a program that adds specializations for is_execution_policy is undefined.

2.4

Sequential execution policy

The class sequential_execution_policy
is an execution policy type used as a unique type to disambiguate
parallel algorithm overloading and require that a parallel algorithm's
execution may not be parallelized.

2.5

Parallel execution policy

The class parallel_execution_policy
is an execution policy type used as a unique type to disambiguate
parallel algorithm overloading and indicate that a parallel algorithm's
execution may be parallelized.

2.6

Parallel+Vector execution policy

The class class vector_execution_policyparallel_vector_execution_policy
is an execution policy type used as a unique type to disambiguate
parallel algorithm overloading and indicate that a parallel algorithm's
execution may be vectorized and parallelized.

Parallel exceptions

Exception reporting behavior

During the execution of a standard parallel algorithm, Iif temporary memory resources are required by the algorithm and none are available,
the algorithm throws a std::bad_alloc exception.

During the execution of a standard parallel algorithm, if the application of a function
objectinvocation of an element access function terminates with an uncaught
exception, the behavior of the program is determined by the type of execution policy used to
invoke the algorithm:

If the execution policy object is of type class vector_execution_policyparallel_vector_execution_policy,
std::terminate shall be called.

If the execution policy object is of type sequential_execution_policy or
parallel_execution_policy, the execution of the algorithm terminates with an
exception_list exception. All uncaught exceptions thrown during
the application of user-provided function objectsinvocations of element
access functions shall be contained in the exception_list.

[ Note:
For example, the number of invocations of the user-provided function object in
for_each is unspecified. When for_each is executed sequentially,
only one exception will be contained in the exception_list object.
— end note ]

[ Note:
These guarantees imply that, unless the algorithm has failed to allocate memory and
terminated with std::bad_alloc, all
exceptions thrown during the execution of
the algorithm are communicated to the caller. It is
unspecified whether an algorithm implementation will "forge ahead" after
encountering and capturing a user exception.
— end note ]

[ Note:
The algorithm may terminate with the std::bad_alloc
exception even if one or more
user-provided function objects have terminated with an
exception. For example, this can happen when an algorithm fails to
allocate memory while
creating or adding elements to the exception_list object.
— end note ]

If the execution policy object is of any other type, the behavior is implementation-defined.

The class exception_listis a containerowns a sequence of exception_ptr objects. The parallel
algorithms may use the exception_list to communicate uncaught exceptions encountered during parallel execution to the
caller of the algorithm.

The type exception_list::iterator shall fulfill the requirements of
ForwardIterator.

size_t size() const noexcept;

Returns:

The number of exception_ptr objects contained within the exception_list.

Complexity:

Constant time.

exception_list::iterator begin() const noexcept;

Returns:

An iterator referring to the first exception_ptr object contained within the exception_list.

exception_list::iterator end() const noexcept;

Returns:

An iterator which is the past-the-end value for the exception_listthat is past the end of the owned sequence.

Effect of execution policies on algorithm execution

Parallel algorithms have template parameters named ExecutionPolicy which describe
the manner in which the execution of these algorithms may be parallelized and the manner in
which they apply user-provided function objectsthe element access functions.

The applications of function objectsinvocations of element access functions
in parallel algorithms invoked with an execution policy object of type sequential_execution_policy
execute in sequential order in the calling thread.

The applications of function objectsinvocations of element access
functions in parallel algorithms invoked with an execution policy object of
type parallel_execution_policy are permitted to execute in an unordered
fashion in unspecified threads, and indeterminately sequenced within each thread.
[ Note:
It is the caller's responsibility to ensure correctness, for example that the invocation does
not introduce data races or deadlocks.
— end note ]

The above example synchronizes access to object x ensuring that it is
incremented correctly.
— end example ]

The applications of function objectsinvocations of element access functions
in parallel algorithms invoked with an execution policy of type
vector_execution_policyparallel_vector_execution_policy
are permitted to execute in an unordered fashion in unspecified threads, and unsequenced
with respect to one another within each thread.
[ Note:
This means that multiple function object invocations may be interleaved on a single thread.
— end note ]

[ Note:
As a consequence, function objects governed by the vector_execution_policy
policy must not synchronize with each other. Specifically, they must not acquire locks.
— end note ][ Note:
This overrides the usual guarantee from the C++ standard, Section 1.9 [intro.execution] that
function executions do not interleave with one another.
— end note ]

Since parallel_vector_execution_policy allows the execution of element access functions to be
interleaved on a single thread, synchronization, including the use of mutexes, risks deadlock. Thus the
synchronization with parallel_vector_execution_policy is restricted as follows:

A standard library function is vectorization-unsafe if it is specified to synchronize with
another function invocation, or another function invocation is specified to synchronize with it, and if
it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions
may not be invoked by user code called from parallel_vector_execution_policy algorithms.

The above program is invalid because the applications of the function object are not
guaranteed to run on different threads.
— end example ]

[ Note:
The application of the function object may result in two consecutive calls to
m.lock on the same thread, which may deadlock.
— end note ]

[ Note:
The semantics of the parallel_execution_policy or the
vector_execution_policyparallel_vector_execution_policy invocation allow the implementation to fall back to
sequential execution if the system cannot parallelize an algorithm invocation due to lack of
resources.
— end note ]

A parallel algorithm invoked with an execution policy object of type
parallel_execution_policy or vector_execution_policy may apply
iterator member functions of a stronger category than its specification requires. In this
case, the application of these member functions are subject to provisions 3. and 4. above,
respectively.

[ Note:
For example, an algorithm whose specification requires InputIterator but
receives a concrete iterator of the category RandomAccessIterator may use
operator[]. In this case, it is the algorithm caller's responsibility to ensure
operator[] is race-free.
— end note ]

Algorithms invoked with an execution policy object of type execution_policy
execute internally as if invoked with instances of type sequential_execution_policy,
parallel_execution_policy, or an implementation-defined execution policy type depending
on the dynamic value of the execution_policy object.the contained execution policy object.

The semantics of parallel algorithms invoked with an execution policy object of
implementation-defined type are unspecifiedimplementation-defined.

4.1.3

ExecutionPolicy algorithm overloads

Parallel algorithms coexist alongside their sequential counterparts as overloads
distinguished by a formal template parameter named ExecutionPolicy. This
is the first template parameter and corresponds to the parallel algorithm's first function
parameter, whose type is ExecutionPolicy&&.
The Parallel Algorithms Library provides overloads for each of the algorithms named in
Table 1, corresponding to the algorithms with the same name in the C++ Standard Algorithms Library.
For each algorithm in Table 1, if there are overloads for
corresponding algorithms with the same name
in the C++ Standard Algorithms Library,
the overloads shall have an additional template type parameter named
ExecutionPolicy, which shall be the first template parameter.
In addition, each such overload shall have the new function parameter as the
first function parameter of type ExecutionPolicy&&.

Unless otherwise specified, the semantics of ExecutionPolicy algorithm overloads
are identical to their overloads without.

Parallel algorithms
have the requirement is_execution_policy<ExecutionPolicy>::value is trueshall not participate in overload resolution unless
is_execution_policy<decay_t<ExecutionPolicy>>::value is true.

The algorithms listed in Table 1 shall have ExecutionPolicy overloads.

Table 1 — Table of parallel algorithms

adjacent_difference

adjacent_find

all_of

any_of

copy

copy_if

copy_n

count

count_if

equal

exclusive_scan

fill

fill_n

find

find_end

find_first_of

find_if

find_if_not

for_each

for_each_n

generate

generate_n

includes

inclusive_scan

inner_product

inplace_merge

is_heap

is_heap_until

is_partitioned

is_sorted

is_sorted_until

lexicographical_compare

max_element

merge

min_element

minmax_element

mismatch

move

none_of

nth_element

partial_sort

partial_sort_copy

partition

partition_copy

reduce

remove

remove_copy

remove_copy_if

remove_if

replace

replace_copy

replace_copy_if

replace_if

reverse

reverse_copy

rotate

rotate_copy

search

search_n

set_difference

set_intersection

set_symmetric_difference

set_union

sort

stable_partition

stable_sort

swap_ranges

transform

uninitialized_copy

uninitialized_copy_n

uninitialized_fill

uninitialized_fill_n

unique

unique_copy

[ Note:
Not all algorithms in the Standard Library have counterparts in Table 1.
— end note ]4.2

For each

Applies f to the result of dereferencing every iterator in the range [first,last).
[ Note:
If the type of first satisfies the requirements of a mutable iterator, f may
apply nonconstant functions through the dereferenced iterator.
— end note ]

Complexity:

Applies f exactly last - first times.

Remarks:

If f returns a result, the result is ignored.

Notes:

Unlike its sequential form, the parallel overload of for_each does not return a copy of
its Function parameter, since parallelization may not permit efficient state
accumulation.

Requires:

Unlike its sequential form, the parallel overload of for_each requires
Function to meet the requirements of CopyConstructible.

Function shall meet the requirements of MoveConstructible[ Note:Function need not meet the requirements of CopyConstructible.
— end note ]

Effects:

Applies f to the result of dereferencing every iterator in the range
[first,first + n), starting from first and proceeding to first + n - 1.
[ Note:
If the type of first satisfies the requirements of a mutable iterator,
f may apply nonconstant functions through the dereferenced iterator.
— end note ]

Applies f to the result of dereferencing every iterator in the range
[first,first + n), starting from first and proceeding to first + n - 1.
[ Note:
If the type of first satisfies the requirements of a mutable iterator,
f may apply nonconstant functions through the dereferenced iterator.
— end note ]

Returns:

first + n for non-negative values of n and first for negative values.

Remarks:

If f returns a result, the result is ignored.

Notes:

Unlike its sequential form, the parallel overload of for_each_n requires
Function to meet the requirements of CopyConstructible.

The primary difference between exclusive_scan and inclusive_scan is that
exclusive_scan excludes the ith input element from the ith sum.
If the operator+ function is not mathematically associative, the behavior of
exclusive_scan may be non-deterministic.

The primary difference between exclusive_scan and inclusive_scan is that
exclusive_scan excludes the ith input element from the ith
sum. If binary_op is not mathematically associative, the behavior of
exclusive_scan may be non-deterministic.

The primary difference between exclusive_scan and inclusive_scan is that
exclusive_scan excludes the ith input element from the ith sum.
If the operator+ function is not mathematically associative, the behavior of
inclusive_scan may be non-deterministic.

The primary difference between exclusive_scan and inclusive_scan is that
inclusive_scan includes the ith input element in the ith sum.
If binary_op is not mathematically associative, the behavior of
inclusive_scan may be non-deterministic.