Contents

Optimization manuals

This series of five manuals describes everything you need to know about optimizing
code for x86 and x86-64 family microprocessors, including optimization advices for C++
and assembly language, details about the microarchitecture and instruction
timings of most Intel, AMD and VIA processors, and details about different compilers and
calling conventions.

3. The microarchitecture of Intel, AMD and VIA CPUs:
An optimization guide for assembly programmers and compiler makers

This manual contains details about the internal working of various microprocessors
from Intel, AMD and VIA. Topics include: Out-of-order execution, register renaming,
pipeline structure, execution unit organization and branch prediction algorithms
for each type of microprocessor. Describes many details that cannot be found
in manuals from microprocessor vendors or anywhere else. The information is
based on my own research and measurements rather than on official sources.
This information will be useful to programmers who want to make CPU-specific
optimizations as well as to compiler makers and students of microarchitecture.

Contains detailed lists of instruction latencies, execution unit throughputs,
micro-operation breakdown and other details for all common application instructions
of most microprocessors from Intel, AMD and VIA. Intended as an appendix to the
preceding manuals. Available as pdf file and as spreadsheet (ods format).

5. Calling conventions for different C++ compilers and operating systems

This document contains details about data representation,
function calling conventions, register usage conventions, name mangling schemes,
etc. for many different C++ compilers and operating systems. Discusses compatibilities
and incompatibilities between different C++ compilers. Includes information that
is not covered by the official Application Binary Interface standards (ABI's).
The information provided here is based on my own research and therefore
descriptive rather than normative.
Intended as a source of reference for programmers who want to make function
libraries compatible with multiple compilers or operating systems and for
makers of compilers and other development tools who want their tools to be
compatible with existing tools.

This is a collection of C++ classes, functions and operators that makes it easier to
use the the vector instructions (Single Instruction Multiple Data instructions) of
modern CPUs without using assembly language. Supports the SSE2, SSE3, SSSE3, SSE4.1,
SSE4.2, AVX, AVX2, AVX512, FMA, and XOP instruction sets. Includes standard mathematical functions.
Can compile for different instruction sets from the same source code.
Description and instructions.
Message board.

This utility can be used for converting object files between COFF/PE, OMF, ELF and
Mach-O formats for all 32-bit and 64-bit x86 platforms.
Can modify symbol names in object files. Can build, modify and convert function libraries
across platforms. Can dump object files and executable files.
Also includes a very good disassembler supporting the SSE4, AVX, AVX2, AVX512, FMA3, FMA4, XOP
and Knights Corner instruction sets.
Source code included (GPL). Manual.

This is a library of optimized subroutines coded in assembly language. The functions in
this library can be called from C, C++ and other compiled high-level languages.
Supports many different compilers under Windows, Linux, BSD and Mac OS X operating systems, 32 and 64 bits.
This library contains faster versions of common C/C++ memory and string functions,
fast functions for string search and string parsing,
fast integer division and integer vector division,
as well as several useful functions not found elsewhere.

Test programs that I have used for my research.
Can measure clock cycles and performance monitor counters such as
cache misses, branch mispredictions, resource stalls etc. in a small piece of
code in C, C++ or assembly. Can also set up performance monitor counters for
reading inside another program.
Supports Windows and Linux, 32 and 64 bit mode, multiple threads.

For experts only. Useful for analyzing small pieces of code but not for profiling
a whole program.

Intel resources

Reference manuals and other documents can be found at Intel's web
site. Intel's web site is refurnished so often that any link I could provide here
to specific documents would be broken after a few months. I will therefore
recommend that you use the search facilities at developer.intel.com
and search for "Software Developer's Manual" and "Optimization Reference Manual".