The team had been ignoring it, because it just didn't make any sense and they had other
things to do. (Fortunately not too many other crashes, the app is pretty solid at this
point). When they turned to me, I was also initially puzzled, because all this should
do on x86 is stuff a zero into %eax and return. This cannot possibly crash[1],
so everyone just assumed that the stack traces were off, as they frequently are.

Fortunately I had just looked at the project settings and noticed that we were compiling
with -O0, so optimizations disabled, and my suspicion was that ARC was doing some
unnecessary retaining. That suspicion turned out to be on the money, otool -Vt revealed that
ARC had turned our innocuous return NO; into the following monstrosity:

Yikes! Of course, this is how ARC works: it generates an insane amount of retains and
releases (hidden inside objc_storeStrong()), then relies on a special optimization pass to remove the insanity and
leave behind the necessary retains/releases. Turn on the "standard" optimization
-Os and we get the following, much more reasonable result:

It isn't clear why those retains/releases were crashing, all the objects involved looked OK in the debugger, but at least we will no longer be puzzled by code that can't possibly crash...crashing, and therefore have a better chance of actually debugging it.

Another issue is performance. I just benchmarked the following equivalent program:

On my 13" MBPR, it runs in roughly 0.5 seconds with ARC disabled and in 13 seconds with
ARC enabled. That's 26 time slower, meaning we now have a highly non-obvious performance
model, where performance is extremely hard to predict and control. The simple and obvious
performance model was one of the main reasons Objective-C code tended to actually be
quite fast if even minimal effort was expended on performance, despite the fact that
some parts of Objective-C aren't all that fast.

I find the approach of handing off all control and responsibility to the optimizer
writers worrying. My worries stem partly from the fact that I've never actually
had that work in the past. With ARC it also happens that
the optimizer can't figure out a retain/release isn't needed, so you need to
sprinkle
a few __unsafe_unretains throughout your code (not many, but you need to
figure out which).

Good optimization has always been something that
needed a human touch (with automatic assistance), the message "just trust
the compiler" doesn't resonate with me. Especially since, and this is
the other part I am worried about, compiler optimizations have been
getting crazier and crazier, clang for example thinks
there is nothing wrong with producing two different values for
de-referencing the same pointer (at the same time, with no
stores in-between (source: http://blog.regehr.org/archives/767):

I tested this with clang-600.0.34.4 on my machine and it also gives
this non-sensical result: 1 2. There are more examples, which I also wrote about
in my post cc -Osmartass.
Of course, Swift moves further in this direction, with expensive default semantics
and reliance on the compiler to remove the resulting glaring inefficiencies.

In what I've seen reported and tested myself, this approach results in differences between normal
builds and -Ofast-optimized builds of more than a factor of 100. That's
not close to being OK, and it makes code much harder to understand and optimize. My
guess is that we will be looking at assembly a lot more when optimizing Swift than
we ever did in Objective-C, and then scratching our heads as to why the optimizer
didn't manage to optimize that particular piece of code.

I fondly remember the "Java optimization" WWDC sessions back when we were supposed to rewrite
all our code in that particular new hotness. In essence, we were given a model of the
capabilities of HotSpot's JIT optimizer, so in order to optimize code we had to know
what the resulting generated code would be, what the optimizer could handle (not a lot),
and then translate that information back into the original source code. At that point,
it's simpler to just write yourself the assembly that you are trying to goad the
JIT into emitting for you. Or portable Macro Assembler. Or object-oriented portable
Macro Assembler.

5 comments:

Anonymous
said...

I would think, that dereferencing "p" after the call to "realloc" (in particular, storing something into that location without first making sure, that p == q) invokes Undefined Behaviour (TM), since "realloc" could just as well have moved the memory, invalidating the originally allocated address. So, technically, clang would have the right to make the snippet do, whatever I feels like...

How is it silly? If you write something that invokes undefined behaviour, the compiler is free to do anything up to and including make demons fly out of your nose. You can't write broken code and complain that the compiler didn't agree with your invented semantics of the chosen language. Well, you can, but you won't get much sympathy.

Best to assume that realloc always *re*-allocs. If the standard explicitly declares that realloc does not realloc sometimes (it declares no such thing: http://man.cx/realloc), then you could do what you did, otherwise, the previous pointer points to freed memory, and what you did was summon arbitrary demons.

Otherwise, I liked the post. Automatic optimization is always a bit hairy and very hard to lean on. For all the noise that gets made about optimization and smart compilers, there's really something to be said for a transparent execution/performance model.