On Sat, 4 Nov 2006, Oliver Wong wrote:>> I'm a novice to compiler theory, and I've been doing some reading on> control flow analysis.>> It looks like the literature all agrees that the first step to> CFA, once you have the AST, is to determine what the basic blocks are> for your code. From what I understand a basic block is a sequence of> instructions such that if any one of those instructions are executed,> then all the instructions in that sequence are executed.>> In the case of Java (and probably other languages with exception,> though I don't have experience with any of them), every instruction> would be its own basic block, because an exception could potentially> be thrown at any point, thus guaranteeing that there does not exist> any pair of points such that you could be certain that the second> instruction would execute, given that the first was executed.>> In particular, Java has an Throwable java.lang.ThreadDeath which> is "thrown in the victim thread when the stop method with zero> arguments in class Thread is called"> (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ThreadDeath.html)>> It seems pretty pointless to have every instruction be in its own> basic block, so I was wondering how can the basic-block system be> reconciled with languages which support Exceptions?

At least in C++, I think the answer is typically "don't use
exceptions," and any optimizing compiler will have a switch to turn
off support for exceptions.

Also, I see that Thread.stop() has been deprecated, and AIUI that's
the only method that ever throws ThreadDeath, so a really clever Java
translator that sees all of a program's translation units together
might even go so far as to figure out that ThreadDeath can never be
thrown, and optimize accordingly.

Okay, now that that's all out of the way... Even in Java, even with
this crazy ThreadDeath thing that can go off at any time, you still
only have to ensure that the program's /visible state/ is updated
correctly. Consider

Since 'foo' doesn't catch ThreadDeath, a thrown ThreadDeath will make
'foo' unwind, immediately discarding all its local variables. So 'foo'
has no "visible state" to preserve. Therefore, the compiler is free to
optimize 'foo' as much as it likes! (For example, it might use a lookup
table instead of that weird 'while' loop.)

However, if the programmer changed 'a' from local to global, all of
a sudden the compiler would be unable to optimize quite as much, because
if ThreadDeath were received in the middle of 'foo', 'a' would have to
have a reasonable value --- we can't optimize out all the writes to 'a'.
But we could still optimize out /all but one/ of the writes! We could
have the generated code look like

and it wouldn't violate any of the rules about preserving the "visible
state".

> As a bonus question: A lot of my interest in this is due to my> trying to understand how the control flow analysis for the FindBugs> software works (http://sourceforge.net/projects/findbugs). FindBugs> defines a class called Location, and puts in the documentation:>> <quote>> Because of JSR subroutines, the same instruction may actually> be part of multiple basic blocks (with different facts> true in each, due to calling context)> </quote>>> This surprised me quite a bit. In all of the literature I've seen> thus far, instructions should only belong to a single basic block at> one time, so I'm trying to understand what deviations from the> literature FindBugs did to implement CFA in Java.

I don't have much idea of what they're doing, but it looks like a
"jsr subroutine" in the JVM is a special kind of subroutine used to
implement 'finally' blocks, and FindBugs treats it like an inline
function. Therefore, the situation is analogous to the following
situation in C:

In 'inlineme', is 'x' null or non-null? That depends on whether it's
being inlined in 'foo' (thus, FindBugs would say it belongs to foo's
basic block) or in 'bar' (thus, FindBugs would say it also belongs to
bar's basic block).