This class introduces attendees to a thorough approach to optimization techniques for contemporary computing architectures. It is based on material from Andrei's upcoming book Fastware. In turn, the book is based on Andrei's career-long experience with tuning the performance of various software systems, from Machine Learning research to high-performance libraries to Facebook-scale computing backends.

Such information is scant and difficult to find. Software engineering folklore is rife with tales of optimizations. Programmers commonly discuss and argue whether a piece of code is supposed to be faster than another, or what to do to improve the performance of a system small or large.

Optimization is big. Arguably it's bigger today, when serial execution speed has stalled and, after parallelizing what's possible, we have single-thread speed as the remaining bottleneck. A large category of applications have no boundaries on desired speed, meaning there's no point of diminishing returns in making code faster. Better speed means less power consumed for the same work, more workload with the same data center expense, better features for the end user, more features for machine learning, better analytics, and more.

Optimizing has always been an art, and in particular optimizing C++ on contemporary hardware has become a task of formidable complexity. This is because modern hardware has a few peculiarities about it that are not sufficiently understood and explored. This class offers a thorough dive in this fascinating world.

Outline

Please note: This course is being actively developed. The actual course might contain more topics and slight variations on the topics outlined below.

The Art of Benchmarking

Conducting Time Measurements

Baselines

Strength Reduction

Minimizing Indirections

Eager Computation: Tables vs. Computation

Lazy Computatio

Memoization

Computation vs. Tables

Lazy Structuring

Instruction-Level Parallelism

Inlining

Smart Resource Optimizations

Copy Elision

Scalable Use of the STL

Building Structure on Top of Arrays

Large Set Operations and Derivatives

Contention Minimization

Attendee Profile

This is aimed at C++ programmers who have efficiency of generated code as a primary concern.

Format

The format is a highly interactive lecture. Questions during the lecture are encouraged. Use of laptops for trying out examples is allowed.