CMOS technology scaling improves the speed and functionality of microprocessors by reducing the size of transistors. Static power dissipation also increases as a result of scaling however, and has been identified as a limiting factor in technology scaling. As current technology approaches that limit, techniques are required both at the technology-level and in the architecture design to reduce sub-threshold leakage, which accounts for the majority of static power dissipation. This thesis presents an approach to predict the idle periods of execution units at runtime and power-gate them during these periods to eliminate their static power leakage. We exploit similar execution characteristics across loop iterations to build a prediction of the units required to execute an entire loop from the units used over the first few iterations. The utilisation of each execution unit is monitored for each iteration, and thresholds are used to determine which units should be power-gated for the remainder of the loop. Three techniques are presented: Loop-Directed Mothballing (LDM), Extended Loop-Directed Mothballing (ELDM) and schedule balancing. LDM power-gates execution units only during innermost loops, which are simple to detect at runtime. ELDM extends this method to all loops using loop entry and exit information gathered offline. The balancing scheduler is developed to balance the types of instruction issued each cycle, to encourage reuse of execution units and make unnecessary units easier to detect. Extensive simulation using traces of 16 benchmarks from the SPEC CPU2006 suite demonstrates that LDM reduces the energy-delay product of our simulated superscalar processor by 10.3%. For traces with a low proportion of executed instructions inside innermost loops, ELDM improves the energy-delay product by up to 13% by allowing the technique to be applied to other loops in the trace. Employing schedule balancing with ELDM achieves similar savings, and simplifies the hardware required to make predictions.