It looks to me like you're doing things right.
I'd say the blame rests with the compilers.

Back when I was still at Rice, I wrote a large collection of test
cases, similar in spirit to the ones you have discussed, along with a
driver that would allow me to test C compilers in a black box fashion.
Friends around the net helped me refine the tests and collect a lot of
numbers on a lot of machines.

I actually didn't test that many things. I looked at dead code
elimination, constant propagation, value numbering (where some
variants would have caught Massey's example above), strength
reduction, and code motion. Just basic scalar optimization -- no
interprocedural analysis, no vectorization, no pointer analysis. In
each case, I used a series of about a dozen tests, starting with
simple cases and progressing to cases that could only be handled by
the best approaches I knew about, and in some instances beyond what I
knew how to do (to see if anyone had discovered some new ideas that
they hadn't published :-)

The results were very disappointing. I found no available compiler
(free, cheap, or expensive) that approached what I had ignorantly
considered "the state of the art."

Talking to various people about this, I've come across several
possible explanations:

C compilers don't work that hard, traditionally.
More interesting would be a similar set of tests on Fortran
compilers.

Nobody in the industry reads recently published compiler papers.
^^^^^ or understands
^^^^^^^^^^^ or believes

where only "reads" is a criticism of the industry.
"Understands" and "believes" are meant to suggest
possible weaknesses in the papers.

None of the things I test really matter. The only things that
matter these days are software pipelining and cache management.

Writing optimizers is expensive and time consuming.
Everything I tested was fairly old (had to be, since they were
released products). How could I expect them to incorporate
recent research results?

Probably all of these excuses contribute, in different proportions for
different compilers, to the actual explanation.

BTW, after I began talking to people about my results, some of the
compiler people at SGI put me in touch with Chris Glaeser (the
originator of this thread). He has written a similar package that's
commercially available from his company Nullstone. Here "similar"
means "similar in spirit." His is much more complete in every
dimension: documentation, correctness, test coverage, etc.