> By running two or three of these calculations in parallel, we got rid> of nearly all the fp pipeline stalls, resulting in final speedups for> his simulations between 30% and 100%.>> It was quite easy to do this using portable C, so I cannot really see> the need to modify either hardware, languages or compilers to realize> these kinds of speedups?

You obviously do not live in the same planet as me. Most people do not
want to change languages, that is, Fortran -> C in most scientific /
technical codes. Most people do not want even to change their
code. Most people do not even know there is an -O flag in their
compiler. They just want to run their code FAST. Ergo, the compilers,
languages, operating systems, and the hardware must live to server.