I check the openMP reference guide and it says that for the parallel for it "only" allows one of the following operators: < <= > >=.

I don't understand why not allow i != j. I could understand if it was the static schedule, since openMP need to pre-compute the number of iterations assigned to each thread. But i can't understand why this limitation in such case for example. Any clues ?

EDIT: even if I make for(i = 0; i != 100; i++), although I could just put "<" or "<=" .

@SamIam No, Programmers are for discussions. There is one clear reason why it works this way, so it belongs here.
–
Mr. kbokNov 9 '12 at 17:17

7

It probably has to do with how the framework is determines how to parallelize the given code segment. I remember doing something like this in college and it boiled down to chunking the loop say (4 times) such that: 0 -> 24, 25 -> 50, etc were given to processors 0-4. It may be strict in this sense that i != 100 could mean values from 0 -> 100 or 101 -> infinity. It is probably just a necessary evil to help the framework.
–
ShelbyZNov 9 '12 at 17:21

2

isn't it obvious that a an element in a parallel programming framework cannot depend on the condition != to decide if a loop should be ended? What happens if an elemental condition check for i != j and it wouldn't proceed with the computation even if i < j.
–
PermanentGuestNov 9 '12 at 18:07

6 Answers
6

I send it an email to the OpenMP developers about this subject, the answer :

For signed int, the wrap around behavior is undefined. If we allow !=, programmers may get unexpected tripcount. The problem is whether the compiler can generate code to compute a trip count for the loop.

For a simple loop, like:

for( i = 0; i < n; ++i )

the compiler can determine that there are 'n' iterations, if n>=0, and zero iterations if n < 0.

For a loop like:

for( i = 0; i != n; ++i )

again, a compiler should be able to determine that there are 'n' iterations, if n >= 0; if n < 0, we don't know how many iterations it has.

For a loop like:

for( i = 0; i < n; i += 2 )

the compiler can generate code to compute the trip count (loop iteration count) as floor((n+1)/2) if n >= 0, and 0 if n < 0.

For a loop like:

for( i = 0; i != n; i += 2 )

the compiler can't determine whether 'i' will ever hit 'n'. What if 'n' is an odd number?

For a loop like:

for( i = 0; i < n; i += k )

the compiler can generate code to compute the trip count as floor((n+k-1)/k) if n >= 0, and 0 if n < 0, because the compiler knows that the loop must count up; in this case, if k < 0, it's not a legal OpenMP program.

For a loop like:

for( i = 0; i != n; i += k )

the compiler doesn't even know if i is counting up or down. It doesn't know if 'i' will ever hit 'n'. It may be an infinite loop.

Contrary to what it may look like, schedule(dynamic) does not work with dynamic number of elements. Rather the assignment of iteration blocks to threads is what is dynamic. With static scheduling this assignment is precomputed at the beginning of the worksharing construct. With dynamic scheduling iteration blocks are given out to threads on the first come, first served basis.

The OpenMP standard is pretty clear that the amount of iteratons is precomputed once the workshare construct is encountered, hence the loop counter may not be modified inside the body of the loop (OpenMP 3.1 specification, §2.5.1 - Loop Construct):

The iteration count for each associated loop is computed before entry to the outermost
loop. If execution of any associated loop changes any of the values used to compute any
of the iteration counts, then the behavior is unspecified.

The integer type (or kind, for Fortran) used to compute the iteration count for the
collapsed loop is implementation defined.

A worksharing loop has logical iterations numbered 0,1,...,N-1 where N is the number of
loop iterations, and the logical numbering denotes the sequence in which the iterations
would be executed if the associated loop(s) were executed by a single thread. The
schedule clause specifies how iterations of the associated loops are divided into
contiguous non-empty subsets, called chunks, and how these chunks are distributed
among threads of the team. Each thread executes its assigned chunk(s) in the context of
its implicit task. The chunk_size expression is evaluated using the original list items of any variables that are made private in the loop construct. It is unspecified whether, in what order, or how many times, any side-effects of the evaluation of this expression occur. The use of a variable in a schedule clause expression of a loop construct causes an implicit reference to the variable in all enclosing constructs.

The rationale behind these relational operator restriction is quite simple - it provides clear indication on what is the direction of the loop, it alows easy computation of the number of iterations, and it provides similar semantics of the OpenMP worksharing directive in C/C++ and Fortran. Also other relational operations would require close inspection of the loop body in order to understand how the loop goes which would be unaceptable in many cases and would make the implementation cumbersome.

OpenMP 3.0 introduced the explicit task construct which allows for parallelisation of loops with unknown number of iterations. There is a catch though: tasks introduce some severe overhead and the one task per loop iteration only makes sense if these iterations take quite some time to be executed. Otherwise the overhead would dominate the execution time.

But ultimately compiler writers have to implement OpenMP directives which must support, amongst other things, efficient static decomposition of the loop between processors, with a finite amount of resources. The standard is the document that tries to draw a balance between flexibility and usability for the developers on the one hand, and tractability for implementers.

You could certainly try to give feedback to the committee; I can't imagine this particular case would be especially difficult for the implementors. On the other hand, I don't see that particular notation for a loop all that often, so I don't know how high a priority it would be; it's not like asking the developers to change a != to a < or > is an enormous imposition.

A colleague of mine is on the OpenMP language committee. I might provide him with the feedback once he is back from SC but I doubt they would even consider such a proposition.
–
Hristo IlievNov 9 '12 at 18:26

The answer is simple.
OpenMP does not allow premature termination of a team of threads.
With == or !=, OpenMP has no way of determining when the loop stops.
1. One or more threads could hit the termination condition, which might not be unique.
2. OpenMP has no way to shut down the other threads that might never detect the condition.

I would be left wondering why the programmer had made that choice, never mind that it can mean the same thing. It may be that OpenMP is making a hard syntactic choice in order to force a certain clarity of code.

Here's code which raises challenges for the use of != and may help explain why it isn't allowed.

notice that i is incremented in both the for statement as well as within the loop itself leading to the possibility (but not the guarantee) of an infinite loop.

If the predicate is < then the loop's behavior can still be well-defined in a parallel context without the compiler having to check within the loop for changes to i and determining how those changes will affect the loop's bounds.

If the predicate is != then the loop's behavior is no longer well-defined and it may be infinite in extent, preventing easy parallel subdivision.

You example is not conformant to the OpenMP specification. You must not modify the loop counter inside the loop body. The single-threaded version produces 0 2 4 6 8 as expected but even with two threads it produces the following output: 0 2 4 5 7 9.
–
Hristo IlievNov 9 '12 at 18:16

You example also fail if your doing that in a normal sequential c program. Nevertheless, the compiler allows that.
–
dreamcrashNov 9 '12 at 18:21

1

The example is not meant to fail (though feel free to change the bounds such that it does), it is meant to demonstrate a difficult the compiler has: it is easy to define behaviors for <, it is quite difficult to define behaviors for !=.
–
RichardNov 9 '12 at 18:27

@dreamcrash, how is his example failing as a sequential C code? It is a perfectly valid serial C code and works as expected as such but is not a valid OpenMP code.
–
Hristo IlievNov 9 '12 at 18:32

The most important part of this answer, I think, is that the loop clauses also mean something to the programmer. < contains more information than !=, by on average a factor of two, and just as @Richard states, if I see that notation used in a loop I then have to read through the loop body before I can feel like I understand what is actually changing through the loop.
–
Jonathan DursiNov 9 '12 at 18:33