P0574r1: Algorithm Complexity Constraints and Parallel Overloads

This paper addresses NB comments CH 10.

Introduction

The C++17 draft features overloads of a lot of the standard library
algorithms which take an ExecutionPolicy argument, in order to
allow parallel implementations. However, the requirements of many of the
algorithms are written in such a way as to require a sequential
implementation, or a sub-optimal parallel implementation, thus eliminating
some or all of the benefit in the new overloads.

P0523r1 attempts to address this
problem by a blanket relaxation of the constraints for overloads with
an ExecutionPolicy. I think this is an incorrect solution. It
simultaneously grants too much leeway for some algorithms, while not fixing
the problems with others. Instead, I think it is better to address each
algorithm individually.

In this paper I have gathered those algorithms where the requirements are
specified in such a way as to rule out an efficient parallel implementation,
and proposed alternative specifications for those algorithms.

This paper has been updated
since P0574r0 to incorporate changes
suggested by SG1 at Kona 2017. Notably:

std::execution::sequential_policy is no longer special;
it is treated as any other ExecutionPolicy from the point
of view of complexity constraints.

The new constraints on the iterators and value types have now been
changed to notes about performance consequences rather than normative
wording.

Section numbers in the proposed wording now refers to numbers from the
latest C++17 draft.

Fix the parallel specification of adjacent_difference and move it to P0467r2.

Algorithms and proposed wording

The following changes are relative to the latest working paper as of 03-02-2017 with the changes from P0467r2 and P0452r1 applied.

25.5.8 Adjacent find [alg.adjacent.find]

Modify the complexity for adjacent_find paragraph 2 as follows:

2 Complexity:For the overloads with no ExecutionPolicyFor a nonempty range, exactly min((i - first) + 1, (last - first) - 1) applications of the corresponding predicate, where i is adjacent_find's return value. For the overloads with an ExecutionPolicy, O(last - first) applications of the corresponding predicate.

25.6.1 Copy [alg.copy]

Modify the requirements for copy_if in paragraph 12 as follows:

12 Requires: The ranges [first, last) and [result, result + (last - first)) shall not overlap. [Note: For the overloads with an ExecutionPolicy, there may be a performance cost if iterator_traits<ForwardIterator1>::value_type is not MoveConstructible. — end note]

25.6.8 Remove [alg.remove]

Modify the requirements for remove_copy and remove_copy_if in paragraph 7 as follows:

7 Requires: The ranges [first, last) and [result, result + (last - first)) shall not overlap. The expression *result = *first shall be valid. [Note: For the overloads with an ExecutionPolicy, there may be a performance cost if iterator_traits<ForwardIterator1>::value_type is not MoveConstructible. — end note]

25.7.2 Nth element [alg.nth.element]

Modify the complexity for nth_element in paragraph 3 as follows:

3 Complexity:For the overloads with no ExecutionPolicy, lLinear on average. For the overloads with an ExecutionPolicy, O(N) applications of the predicate, and O(N log N) swaps, where N = last - first.

25.7.4 Partitions [alg.partitions]

Modify the complexity for partition in paragraph 7 as follows:

7 Complexity:If ForwardIterator meets the requirements for a BidirectionalIterator, at most (last - first) / 2 swaps are done; otherwise at most last - first swaps are done. Exactly last - first applications of the predicate are done.Let N = last - first:

For the overloads with no ExecutionPolicy, exactly N applications of the predicate. At most N / 2 swaps if ForwardIterator meets the BidirectionalIterator requirements and at most N swaps otherwise.

For the overloads with an ExecutionPolicy, O(N log N) swaps and O(N) applications of the predicate.

Modify the complexity for stable_partition in paragraph 11 as follows:

11 Complexity:At most N log(N) swaps, where N = last - first, but only O(N) swaps if there is enough extra memory. Exactly last - first applications of the predicate.Let N = last - first:

For the overloads with no ExecutionPolicy, at most N log N swaps, but only O(N) swaps if there is enough extra memory. Exactly N applications of the predicate.

For the overloads with an ExecutionPolicy, O(N log N) swaps and O(N) applications of the predicate.

If init is provided, all of binary_op(init, *first), binary_op(init, init), and binary_op(*first, *first) shall be convertible to T; otherwise, binary_op(*first, *first) shall be convertible to ForwardIterator1's value type.

If init is provided, all of binary_op(init, unary_op(*first)), binary_op(init, init), and binary_op(unary_op(*first), unary_op(*first)) shall be convertible to T; otherwise, binary_op(unary_op(*first), unary_op(*first)) shall be convertible to ForwardIterator1's value type.