Understanding SIMD Optimization Layers and Dispatching in the Intel® IPP

This is a computer translation of the original content. It is provided for general information only and should not be relied upon as complete or accurate.

This article describes the Intel® Integrated Performance Primitives (Intel® IPP) optimization layers present in the 8.2 * version of the library. The article titled Understanding CPU Dispatching in the Intel® IPP Library describes the same features for previous versions of the library (5.3 thru 6.1 **).

The standard distribution of the Intel IPP library contains multiple, functionally-identical, SIMD-specific, optimized libraries (or layers) that are automatically “dispatched” at run-time. The “dispatcher” directs your calls to the appropriate optimized library layer based on SIMD capabilities discovered during library initialization. This is done to maximize each function’s use of the runtime processor's underlying SIMD instructions and other architecture-specific features.

Note: you can build custom processor-specific libraries that do not require the dispatcher, but that is outside the scope of this article. Please read this IPP linkage models article for information on how to build custom versions of the IPP library.

Dispatching selects the Intel IPP optimized library layer that corresponds to the runtime CPU's SIMD instruction set. For example, on a Windows installation, the $(IPPROOT)\..\redist\intel64\ipp directory contains a file named ippih9-8.2.dll which contains version ‘8.2’ of the optimized image processing libraries for processors that support the Intel AVX2 instructions on 64-bit processors; ‘ippi’ denotes the image processing domain, ‘h9’ denotes the AVX2 instructions set for 64-bit processors and ‘8.2’ denotes the library’s version number.

In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time, and sets up a variable internal to the library that directs your calls to the SIMD-specific functions that match the runtime processor. For example, ippsCopy_8u(), has multiple implementations stored in the library, with each version optimized to a specific SIMD instruction set. The h9_ippsCopy_8u() version of ippsCopy_8u() is called by the dispatcher when running on an Intel® Haswell processor in 64-bit addressing mode, because h9_ippsCopy_8u() is optimized for the AVX2 instruction set architecture supported by that processor in 64-bit addressing mode.

Initializing the IPP Dispatcher

Identifying the runtime processor and initializing the dispatcher should be the first action you take with the Intel IPP library. If you are using the standard dynamic link library this process is handled automatically when the Intel IPP shared library is initialized. If you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.

Because the minimum SIMD instruction set is SSE on IA-32 and Intel 64 processors it is recommended that you ALWAYS call the the ippInit() function before making any other calls to the Intel IPP library. This advice applies regardless of whether you are linking against the static or dynamic form of the library (even though the dynamic library will also perform this call).

Calling the ippInit() function with the shared libraries (DLL and SO) will generate an error message to a dialog box or error console if the ippInit() function detects that the runtime CPU is not supported by the Intel IPP library. Calling the ippInit() function in the static versions of the library will not generate a console or dialog message. Both versions of the ippInit() function will return an error code when a non-supported CPU is detected.

It is important that you call the ippInit() function at the beginning of your application to insure that the processor on which your application is running will support the Intel IPP library. If the ippInit() function returns an error code you should close your application gracefully in order to avoid an unexpected termination of your application by an invalid instruction fault because your application is running on an unsupported processor.

The following table lists the SIMD architecture codes supported by the Intel IPP library.

The following table lists the SIMD architecture codes supported by the Intel IPP library.

Creating two separate optimization layers within the IPP library for the small set of instructions added by SSE4.2 and AES-NI would be very space inefficient, so they are bundled into the SSE4.1 library (p8/y8) as minor variants to that optimization layer. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library if your runtime processor supports these instructions. Because the enhancements affect the optimization of only a small number of Intel IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor that supports these extra instructions.

S8/N8 (Atom) Dispatch

Unlike preceding versions of the library, the 7.0 version of the Intel IPP library does include Atom-optimized variants of the library within all formats (static and dynamic) of the library. For this reason, the Linux distribution of the 7.0 version of the Intel IPP library no longer includes a separate Atom-specific version of the library, since Atom-specific optimizations have been fully merged into all formats of the standard library files.

The following table was copied from an Intel Compiler Pro options article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which SIMD instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel IPP library.

The Intel IPP library dispatching mechanism behaves differently than that found in the Intel Compiler products, and may also behave differently than other Intel library products.