This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear alge- bra operations, from the point of view of both computa- tional performance and energy consumption. The strategies ...[+]

This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear alge- bra operations, from the point of view of both computa- tional performance and energy consumption. The strategies considered here, referred to as the Slack Reduction Algo- rithm (SRA) and the Race-to-Idle Algorithm (RIA), adjust the operation frequency of the cores during the execution of a collection of tasks (in which many dense linear alge- bra algorithms can be decomposed) with very different ap- proaches to save energy. The procedures are evaluated using an energy-aware simulator, which is in charge of schedul- ing/mapping the execution of these tasks to the cores, lever- aging dynamic frequency voltage scaling featured by current technology. Experiments with this tool and the practical in- tegration of the RIA strategy into a runtime show the energy gains for two versions of the QR factorization.[-]