> I believe software pipelining can improve performance even when there> are only a few functional units.> For example, suppose the following code is run on a machine with a> single FU.> All the> It is just incrementing an array of 10 elements.>> ldc r0, 0> L1:> 0. load r1, 0(r0) # latency 2> 1. add r1, r1, 1 # latency 1> 2. store 0(r0), r1 # latency 1> 3. add r0, r0, 1 # latency 1> 4. cmp r0, 10 # latency 1> 5. blt L1 # latency 1>> For a non-SW pipelined code, a single iteration would take 7 cycles.> Now, for the SW pipelined code, a single iteration would take 6 cycles> on average.

Even though I understand that this is an example, I still
have the feeling that with just a single functional unit, the
benefits of pipelining are marginal, in your example it's just
about 1 cycle. But I also agree that this might have some
positive effect when the loop has a high iteration count.