[Please reply to me personally via email, rather than posting a
follow-up article in this newsgroup, so as minimize traffic. I will
summarize any informative responses in a few days.]

I am working on a thesis based on the idea is that it is worthwhile to
feed profiler output (from actual runs of the program by the real
end-users using real data on real machines) back to the COMPILER and
not just the human programmer... especially if the human programmer
used annotations in the source code to give hints to the compiler as
to the kinds of more drastic and aggessive program transformations it
can attempt to perform.

I am seeking real-world examples where this would have been useful.
Ideally, I'd like to hear both the application context and actual
(short) snippets of code illustrating the point. (C/C++ code
preferred but not necessary... I'm asking for snippets of code for
concreteness.) Also, if you know of existing run-time systems,
libraries, compilers, etc., which already do this kind of automatic
adaptation in an interesting fashion, please let me know about them.

----------------

This may seem a bit vague, so I've given some (perhaps contrived)
examples below. I don't want to overly bias the kinds of examples I
might get back from people by the choice of these examples --- I'm
interested in hearing about all sorts of cases where if the compiler
could just know how long the resulting program *ACTUALLY* takes to
run, it could make better decisions during a recompile without having
to involve a human programmer at that point in the program lifecycle.
In other words, scenarios where the compiler could have generated
better-performing code if it could have conducted experiments with the
actual compiled code on the destination platform.

* On large data sets, common for scientific and engineering
simulations, for example, it's pretty important to size the working
set of data being accessed at any given time to fit into main
memory and in the cache. The larger the "blocking size", however,
the better the performance, at least until the working set gets
*TOO* large. Profiling data would allow the compiler to
dynamically experiment with the blocking size and data alignments
to minimize cache misses and virtual memory thrashing for both
high-end and low-end machines of a given binary-compatible
architecture line. While it is possible to try to model such
effects, the large variety of cache and memory organizations out
there make analysis difficult.

* Quicksort (implemented by qsort() in the C library) performs very
well on average. However, other sorts sometimes perform better; in
particular, Quicksort performs poorly on almost-sorted arrays (and
even randomized Quicksort has higher overhead than other sorts for
almost-sorted arrays). While in this particular instance, it would
not be difficult to write qsort() to check its input data to see if
it's already almost sorted and not use Quicksort if that is the
case, that check would itself be not computationally cheap;
furthermore, in general, it would not be easy to even see how to
write a test to decide which implementation would run likely
faster, much less write such a test which would be computationally
cheap. A compiler could experiment with linking in such an
alternative qsort() implementation, and notice that application A
wins, while application B loses.

* Performance of programs which use mathematical functions which are
expensive to compute can often be improved by memoization, where
the function remembers the results for arguments which it has
recently been called with. However, memoization isn't free: it has
space costs (past arguments and results need to be cached somewhere
in memory) and time costs (every call to the function now checks if
the result is already known... the more past calls are "memoized",
the longer this check is likely to take). A compiler could, with
programmer help, generate versions of functions which do or do not
do memoization, and use the appropriate one for applications which
do often call the function with the same arguments vs. applications
which do not.

* Cross-platform variance of performance of algorithms. I can't
come up with an example off the top of my head, but I'd really love
to see something along this line.

* Cross-user variance of how programs are used. A very vague example
might be a novice who uses very simple database queries vs. someone
who almost always constructs very complex precise queries.

----------------

[Repeat: please reply to me personally via email, rather than posting
a follow-up article in this newsgroup, so as minimize traffic. I will
summarize any informative responses in a few days.]