Unfortunately, the words "undefined behavior" are not used. However, anytime the standard says "the compiler may assume P," it is implied that a program which has the property not-P has undefined semantics.

Is that correct, and is the compiler allowed to print "Bye" for the above program?

There is an even more insightful thread here, which is about an analogous change to C, started off by the Guy done the above linked article. Among other useful facts, they present a solution that seems to also apply to C++0x (Update: This won't work anymore with n3225 - see below!)

endless:
goto endless;

A compiler is not allowed to optimize that away, it seems, because it's not a loop, but a jump. Another guy summarizes the proposed change in C++0x and C201X

By writing a loop, the programmer is asserting either that the
loop does something with visible behavior (performs I/O, accesses
volatile objects, or performs synchronization or atomic operations),
or that it eventually terminates. If I violate that assumption
by writing an infinite loop with no side effects, I am lying to the
compiler, and my program's behavior is undefined. (If I'm lucky,
the compiler might warn me about it.) The language doesn't provide
(no longer provides?) a way to express an infinite loop without
visible behavior.

Update on 3.1.2011 with n3225: Committee moved the text to 1.10/24 and say

The implementation may assume that any thread will eventually do one of the following:

while(1) { MyMysteriousFunction(); } must be independently compilable without knowing the definition of that mysterious function, right? So how can we determine if it makes calls to any library I/O functions? In other words: surely that first bullet could be phrased makes no calls to functions.
–
Daniel EarwickerAug 28 '10 at 21:44

16

@Daniel: If it has access to the function's definition, it can prove a lot of things. There is such a thing as interprocedural optimization.
–
PotatoswatterAug 28 '10 at 21:46

2

Right now, in C++03, is a compiler allowed to change int x = 1; for(int i = 0; i < 10; ++i) do_something(&i); x++; into for(int i = 0; i < 10; ++i) do_something(&i); int x = 2;? Or possibly the other way, with x being initialized to 2 before the loop. It can tell do_something doesn't care about the value of x, so its a perfectly safe optimization, ifdo_something doesn't cause the value of i to change such that you end up in an infinite loop.
–
Dennis ZickefooseAug 29 '10 at 4:21

7 Answers
7

This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven.

Presumably, this is because proving termination mechanically is difficult, and the inability to prove termination hampers compilers which could otherwise make useful transformations, such as moving nondependent operations from before the loop to after or vice versa, performing post-loop operations in one thread while the loop executes in another, and so on. Without these transformations, a loop might block all other threads while they wait for the one thread to finish said loop. (I use "thread" loosely to mean any form of parallel processing, including separate VLIW instruction streams.)

Here, it would be faster for one thread to do the complex_io_operation while the other is doing all the complex calculations in the loop. But without the clause you have quoted, the compiler has to prove two things before it can make the optimisation: 1) that complex_io_operation() doesn't depend on the results of the loop, and 2) that the loop will terminate. Proving 1) is pretty easy, proving 2) is the halting problem. With the clause, it may assume the loop terminates and get a parallelisation win.

I also imagine that the designers considered that the cases where infinite loops occur in production code are very rare and are usually things like event-driven loops which access I/O in some manner. As a result, they have pessimised the rare case (infinite loops) in favour of optimising the more common case (noninfinite, but difficult to mechanically prove noninfinite, loops).

It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing.

EDIT: with respect to the insightful article you now link, I would say that "the compiler may assume X about the program" is logically equivalent to "if the program doesn't satisfy X, the behaviour is undefined". We can show this as follows: suppose there exists a program which does not satisfy property X. Where would the behaviour of this program be defined? The Standard only defines behaviour assuming property X is true. Although the Standard does not explicitly declare the behaviour undefined, it has declared it undefined by omission.

Consider a similar argument: "the compiler may assume a variable x is only assigned to at most once between sequence points" is equivalent to "assigning to x more than once between sequence points is undefined".

"Proving 1) is pretty easy" - in fact doesn't it follow immediately from the 3 conditions for the compiler to be allowed to assume loop termination under the clause Johannes is asking about? I think they amount to, "the loop has no observable effect, except perhaps spinning forever", and the clause ensures that "spinning forever" isn't guaranteed behaviour for such loops.
–
Steve JessopAug 28 '10 at 22:45

@Steve: it's easy if the loop doesn't terminate; but if the loop does terminate then it may have nontrivial behaviour which affects the processing of the complex_io_operation.
–
Philip PotterAug 28 '10 at 23:16

Oops, yes, I missed that it might modify non-volatile locals/aliases/whatever which are used in the IO op. So you're right: although it doesn't necessarily follow, there are many cases in which compilers can and do prove that no such modification occurs.
–
Steve JessopAug 28 '10 at 23:43

"It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing." Just compile with optimizations off and it should still work
–
KitsuneYMGAug 30 '10 at 18:16

I think the correct interpretation is the one from your edit: empty infinite loops are undefined behavior.

I wouldn't say it's particularly intuitive behavior, but this interpretation makes more sense than the alternative one, that the compiler is arbitrarily allowed to ignore infinite loops without invoking UB.

If infinite loops are UB, it just means that non-terminating programs aren't considered meaningful: according to C++0x, they have no semantics.

That does make a certain amount of sense too. They are a special case, where a number of side effects just no longer occur (for example, nothing is ever returned from main), and a number of compiler optimizations are hampered by having to preserve infinite loops. For example, moving computations across the loop is perfectly valid if the loop has no side effects, because eventually, the computation will be performed in any case.
But if the loop never terminates, we can't safely rearrange code across it, because we might just be changing which operations actually get executed before the program hangs. Unless we treat a hanging program as UB, that is.

“empty infinite loops are undefined behavior”? Alan Turing would beg to differ, but only when he gets over spinning in his grave.
–
Donal FellowsSep 1 '10 at 19:34

5

@Donal: I never said anything about its semantics in a Turing machine. We're discussing the semantics of an infinite loop with no side effcts in C++. And as I read it, C++0x chooses to say that such loops are undefined.
–
jalfSep 1 '10 at 20:56

Empty infinite loops would be silly, and there'd be no reason to have special rules for them. The rule is designed to deal with useful loops of unbounded (hopefully not infinite) duration, which calculate something that will be needed in future but not immediately.
–
supercatSep 2 '10 at 15:48

Does this mean that C++0x is not suited for embedded devices? Almost all embedded devices are non-terminating and do their job inside a big fat while(1){...}. They even routinely use while(1); to invoke a watchdog-assisted reset.
–
vszJun 19 '14 at 3:20

@vsz: the first form is fine. Infinite loops are perfectly well defined, as long as they have some sort of observable behavior. The second form is trickier, but I can think of two very easy ways out: (1) a compiler targeting embedded devices could just choose to define stricter behavior in that case, or (2) you create a body which calls some dummy library function. As long as the compiler doesn't know what that function does, it has to assume that it may have some side effect, and so it can't mess with the loop.
–
jalfJun 19 '14 at 7:49

Nice question. Seems like that guy had exactly the problem that this paragraph allows that compiler to cause. In the linked discussion by one of the answers, it is written that "Unfortunately, the words 'undefined behavior' are not used. However, anytime the standard says 'the compiler may assume P,' it is implied that a program which has the property not-P has undefined semantics." . This surprises me. Does this mean my example program above has undefined behavior, and may just segfault out of nowhere?
–
Johannes Schaub - litbAug 28 '10 at 21:58

@Johannes: the text "may be assumed" doesn't occur anywhere else in the draft I have to hand, and "may assume" only occurs a couple of times. Although I checked this with a search function which fails to match across line breaks, so I may have missed some. So I'm not sure the author's generalisation is warranted on the evidence but as a mathematician I have to concede the logic of the argument, that if the compiler assumes something false then in general it may deduce anything...
–
Steve JessopAug 28 '10 at 23:06

...Permitting a contradiction in the compiler's reasoning about the program certainly hints very strongly at UB, since in particular it allows the compiler, for any X, to deduce that the program is equivalent to X. Surely permitting the compiler to deduce that is permitting it to do that. I agree with the author, too, that if UB is intended it should be explicitly stated, and if it's not intended then the spec text is wrong and should be fixed (perhaps by the spec-language equivalent of, "the compiler may replace the loop with code that has no effect", I'm not sure).
–
Steve JessopAug 28 '10 at 23:09

@SteveJessop: What would you think of simply saying that execution of any piece of code--including infinite loops--may be postponed until such time as something the piece of code did would affect an observable program behavior, and that for purposes of that rule, the time required to execute a piece of code--even if infinite--is not an "observable side-effect". If a compiler can demonstrate that a loop cannot exit without a variable holding a certain value, the variable may be deemed to hold that value, even it could also be shown that the loop could not exit with it holding that value.
–
supercatMar 13 '14 at 17:03

@supercat: as you've stated that conclusion, I don't think it improves things. If the loop provably never exits then for any object X and bit-pattern x, the compiler can demonstrate the loop does not exit without X holding bit-pattern x. It's vacuously true. So X could be deemed to hold any bit pattern, and that's as bad as UB in the sense that for the wrong X and x it will swiftly cause some. So I believe you need to be more precise in your standardese. It's difficult to talk about what happens "at the end of" an infinite loop, and show it equivalent to some finite operation.
–
Steve JessopMar 13 '14 at 17:09

The relevant issue is that the compiler is allowed to reorder code whose side effects do not conflict. The surprising order of execution could occur even if the compiler produced non-terminating machine code for the infinite loop.

I believe this is the right approach. The language spec defines ways to enforce order of execution. If you want an infinite loop that cannot be reordered around, write this:

@JohannesSchaub-litb: If a loop--endless or not--doesn't read or write any volatile variables during execution, and does not call any functions that might do so, a compiler is free to defer any portion of the loop until the first effort to access something computed therein. Given unsigned int dummy; while(1){dummy++;} fprintf(stderror,"Hey\r\n"); fprintf(stderror,"Result was %u\r\n",dummy);, the first fprintf could execute, but the second could not (the compiler could move the computation of dummy between the two fprintf, but not past the one that prints its value).
–
supercatMar 13 '14 at 16:51

I think it's worth pointing out that loops which would be infinite except for the fact that they interact with other threads via non-volatile, non-synchronised variables can now yield incorrect behaviour with a new compiler.

I other words, make your globals volatile -- as well as arguments passed into such a loop via pointer/reference.

I think the issue could perhaps best be stated, as "If a later piece of code does not depend on an earlier piece of code, and the earlier piece of code has no side-effects on any other part of the system, the compiler's output may execute the later piece of code before, after, or intermixed with, the execution of the former, even if the former contains loops, without regard for when or whether the former code would actually complete. For example, the compiler could rewrite:

By having one CPU handle the calculations and another handle the progress bar updates, the rewrite would improve efficiency. Unfortunately, it would make the progress bar updates rather less useful than they should be.

I think that your progress bar case could not be separated, because displaying a progress bar is a library I/O call. Optimisations should not change visible behaviour in this way.
–
Philip PotterSep 2 '10 at 15:55

@Philip Potter: If the slow routine had side-effects, that would certainly be true. In my example before, it would be meaningless if it didn't, so I changed it. My interpretation of the spec is that the system is allowed to defer the execution of the slow code until such time as its effects (other than the time it takes to execute) would become visible, i.e. the show_result() call. If the progress-bar code made use of the running total, or at least pretended to do so, that would force it to sync up with the slow code.
–
supercatSep 3 '10 at 15:22

It is not decidable for the compiler for non-trivial cases if it is an infinite loop at all.

In different cases, it can happen that your optimiser will reach a better complexity class for your code (e.g. it was O(n^2) and you get O(n) or O(1) after optimisation).

So, to include such a rule that disallows removing an infinite loop into the C++ standard would make many optimisations impossible. And most people don't want this. I think this quite answers your question.

Another thing: I never have seen any valid example where you need an infinite loop which does nothing.

The one example I have heard about was an ugly hack that really should be solved otherwise: It was about embedded systems where the only way to trigger a reset was to freeze the device so that the watchdog restarts it automatically.

If you know any valid/good example where you need an infinite loop which does nothing, please tell me.