> DSP's are a different breed of cat. The performance in a DSP is>gained from the use of multiple address spaces, all fetched>simultaniously. For good DSP performance, the programmer has to be>able to specify which arrays of data go into which memories. The>typical multiplier accumulator needs two operands to stuff into the>the multiplier on each cycle. Typically the DSP has a pair of data>memories (A and B). Multiplication is quick when multiplying data in>A by data in B, and slow (or impossible) when multiplying data in A by>data in A. (or B by B). C just doesn't have the syntax to say "locate>this array in the A data memory and that array in the B data memory.>In C, memory is memory. In theory a set of #pragma's could be defined>to specify where to put data, but these pragma's would be machine>dependant and vary from manufacturer to manufacturer, which defeats>the main purpose of using a high level language (portability).

This doesn't sound too difficult to solve in a compiler: whenever a
MUL between array elements happen, a constraint that the arrays should
be in different memories is generated. Once all constraints are found,
it is tested whether they can be solved, and in case they can't you
use some heuristic based on cost (similar to register spilling
heuristics) to decide where to slack the constraints. This can be done
at link time (it is anyway the linker that decides where to store
arrays).

> In practice, the tough DSP algorithms are things like FFT, FIR,>convolution. You should be able to obtain optimized assembler code>for this stuff and call it from C. So your main routine and much of>the code would be C, and the key routines would be hand done assembler>from a library supplied by the DSP chip maker.

I agree that things like the bit-reversed addressing found in some
DSPs for FFT etc. are difficult to impossible to handle in a compiler
for C. A way is to put some things into libraries, which can then be
optimized for each machine, either using assembly language or
specialized idioms that the compiler can recognize as being equivalent
to a particular machine feature. Some compilers e.g. compile the
expression (x<<(32-n))|(x>>n) to a rotate by n bits (on 32-bit
processors).