Modern network processors such as the Intel IXP family hide the latency of slow instructions by supporting multiple threads of execution. Context switches in the IXP architecture are designed to be very fast. However, the low overhead is partly achieved by leaving register management to programs, with little support from the hardware. The complexity of the multi-engine, multi-threaded environment makes manual register management a daunting task, which is better left to the compiler. However, a purely static analysis may not be able to achieve full utilization of the register file due to conservative estimates of liveness. A register that is live across a context switch point must be considered live for the duration of all other threads, and so it must be assumed to be unavailable to other threads.
In addition, aliasing further reduces the effectiveness of static analysis. The net effect is a large number of idle cycles that are still present after static optimization.
We propose a dynamic solution that requires minimal software and hardware support. On the software side, we take a pre-allocated binary file and annotate the potential context switch instructions with information about the dead registers. On the hardware side, we try to rename all transfer registers and addresses to dead general purpose registers and update the vector of used registers. We then replace the long-latency memory instructions with fast move instructions in the architecture using the dynamic context. The results show up to 51% reduction in idle cycles and up to 14% increase in the throughput for hand coded applications.