Abstract

Computing elementary functions on large arrays is an essential part of many machine learning and signal processing algorithms. Since the introduction of floating-point computations in mainstream processors, table lookups, division, square root, and piecewise approximations were essential components of elementary functions implementations. However, we suggest that these operations can not deliver high throughput on modern processors, and argue that algorithms which rely only on multiplication, addition, and integer operations would achieve higher performance. We propose 4 design principles for high-throughput elementary functions and suggest how to apply them to implementation of log, exp, sin, and tan functions. We evaluate the performance and accuracy of the new algorithms on three recent x86 microarchitectures and demonstrate that they compare favorably to previously published research and vendor-optimized libraries.

Keywords

Notes

Acknowledgements

This work was supported in part by the National Science Foundation (NSF) under NSF CAREER award number 0953100 and the U.S. Dept. of Energy (DOE), Office of Science, Advanced Scientific Computing Research under award DE-FC02-10ER26006/DE-SC0004915. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of NSF or DOE.