Scott Meyers Training Courses

Fastware for C++

Fastware is software that's fast —
that gets the job done quickly. Low latency is the name of the
game, and achieving it calls for insights from software
engineering, computer science, and the effective use of C++. This
presentation addresses crucial issues in each of these areas,
covering topics as diverse as CPU caches, speed-sensitive use
of the STL, data structures supporting concurrency,
profile-guided optimization, and more.

Much of the material in "Fastware for C++" is
unique to this seminar, i.e., unavailable in Scott's
publications or his other training courses. However, as the
successor to Scott's acclaimed "High-Performance C++
Programming" seminar, "Fastware for C++"
also includes updated discussions of topics from that course as
well as from Scott's books, Effective
C++, More
Effective C++, and Effective
STL.

Course Highlights

Participants will gain:

Recognition of the importance and implications of treating performance as a correctness criterion.

Understanding of how effective use of third-party APIs can improve system performance.

Knowledge of specific C++ practices that improve the speed of both the language and the STL.

Familiarity with concurrent data structures and algorithms poised to become de facto standards.

Who Should Attend

Systems designers, programmers, and technical managers
involved in the design, implementation, and maintenance of
performance-sensitive libraries and applications using
C++. Participants should already know the basic features of C++
(e.g., classes, inheritance, virtual functions, templates), but
expertise is not required. Knowledge of common threading
constructs (e.g., threads, mutexes, condition variables, etc.)
is helpful. People who have learned C++ recently, as well as
people who have been programming in C++ for many years, will
come away from this seminar with useful, practical, proven
information.

Format

Lecture and question/answer. There are no hands-on exercises, but participants
are welcome — encouraged! — to bring computers to experiment with the material as it is presented.

Length

Two full days (six to seven lecture hours per day).

Detailed Topic Outline

Treating speed as a correctness criterion.

Why "first make it right, then make it fast" is misguided.

Latency, initial and total.

Other performance measures.

Designing for speed.

Optimizing systems versus optimizing programs.

Most system components are "foreign."

Exercising indirect control over "foreign" components.

Examples.

CPU Caches and why they're important.

Data caches, instruction caches, TLBs.

Cache hierarchies, cache lines, prefetching, and traversal orders.

Cache coherency and false sharing.

Cache associativity.

Guidelines for effective cache usage.

Optimizing C++ usage:

Move semantics.

Avoiding unnecessary object creations.

When custom heap management can make sense.

Optimizing STL usage:

reserve and shrink_to_fit.

Range member functions.

Using function objects instead of function pointers.

Using sorted vectors instead of associative containers.

A comparison of STL sorting-related algorithms.

An overview of concurrent data structures.

Meaning of "concurrent data structure."

Use cases.

Common offerings in TBB and PPL.

Writing your own.

An overview of concurrent STL-like algorithms.

Thread management and exception-related issues.

Common offerings in TBB and PPL.

OpenMP.

Other TBB and PPL offerings.

Exploiting "free" concurrency.

Meaning of "free."

Multiple-approach problem solving.

Speculative execution.

Making use of PGO (profile-guided optimization) and WPO
(whole-program optimization).