Change in threading model: In Intel MKL 10.0, when the environment variable OMP_NUM_THREADS is undefined, Intel MKL may create multiple threads depending on problem size and the value of the MKL_DYNAMIC or other threading environment variables. In the release earlier than MKL 10.0, when OMP_NUM_THREADS was undefined the number of threads for MKL defaulted to 1.

New threading functions and variables: Besides OpenMP* environment variables and functions in the previous releases, Intel MKL 10.0 also offers new environment variables, such as MKL_NUM_THREADS, MKL_DOMAIN_NUM_THREADS allow user to have greater control over MKL threading behavior.

New Intel MKL Dynamic Interfaces for Windows* OS* started to supports since the version 10.3. For more details please see here.

What are new additional threading controls in MKL10.x?

Intel MKL 10.0 introduces new optional threading controls, that is, the environment variables and service functions. They behave similar to their OpenMP equivalents, but take precedence over them. By using these controls along with OpenMP variables, users can thread the part of the application that does not call Intel MKL and the library independently from each other.

These controls enable you to specify the number of threads for Intel MKL independently from OpenMP settings. Although Intel MKL may actually use the number of threads that differs from the one suggested, the controls will also enable you to instruct the library to try using the suggested number in the event of undetectable threading behavior in the application calling the library.

Employing Intel MKL threading controls in your application is optional. If you do not use them, the library will mainly behave the same way as Intel MKL 9.1 in what relates to threading with the possible exception of a different default number of threads.

Note: Intel MKL does not always have a choice on the number of threads for certain reasons, such as system resources.

How to set the number of threads for MKL functions?

Users can employ different techniques to specify the number of threads to use in Intel MKL.

Set OpenMP or Intel MKL environment variable:

OMP_NUM_THREADS

MKL_NUM_THREADS

MKL_DOMAIN_NUM_THREADS

Call OpenMP or Intel MKL function:

omp_set_num_threads()

mkl_set_num_threads()

mkl_domain_set_num_threads().

How can I decide the ways to set the threading numbers?

When choosing the appropriate ways to set threading numbers for MKL functions,take into account the following rules:

OpenMP threading controls vs Intel MKL threading controls: Users can choose either Intel MKL threading controls or OpenMP techniques. If users choose the OpenMP techniques (OMP_NUM_THREADS and omp_set_num_threads()) only, it will be the similar case with earlier Intel MKL versions. Intel MKL threading controls take precedence over OpenMP MKL threading control. It enables users to specify the number of threads for Intel MKL independently of the OpenMP settings. By using MKL threading controls along with OpenMP variables, users can thread the part of the application that does not call Intel MKL and MKL library independently from each other.

Threading subroutine calls vs environment variables: Both environment variable and threading subroutine calls can affect the threading numbers. But the environment variables cannot be used to change run-time behavior in the course of the run, as they are read only once. Users need to choose a subroutine call if they want to dynamically change the threading number at runtime.

A subroutine call takes precedence over any environment variables. The exception is the OpenMP subroutine omp_set_num_threads(), which does not have precedence over Intel MKL environment variables, such as MKL_NUM_THREADS.

MKL_DOMAIN_NUM_THREADS and MKL_DYNAMIC: Intel MKL provides domain-specific environment variable and fun ctions to control threading numbers for specific functions domain( BLAS, FFT, VML, etc). Intel MKL also provides MKL_DYNAMIC to enables and disables Intel MKL to dynamically change the number of threads. Please refer to MKL_DYNAMIC and MKL_DOMAIN_NUM_THREADS for details.

How many threadings should I use with my Hyper-Threading system?

Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled.

If Hyper-Threading technology is enabled on the systems, it is recommended that the threading numbers be set equal to the number of real processors or cores. That is only half number of the logical processors.

Note: If the requested number of threads exceeds the number of physical cores (perhaps because of hyper-threading), and MKL_DYNAMIC is not changed from its default value of TRUE, Intel MKL will scale down the number of threads to the number of physical cores.

MKL_DYNAMIC

MKL_DYNAMIC being TRUE means that Intel MKL will always try to pick what it considers the best number of threads, up to the maximum specified by the user. MKL_DYNAMIC being FALSE means that Intel MKL will not deviate from the number of threads the user requested, unless there are reasons why it has no choice. The value of MKL_DYNAMIC is by default set to TRUE, regardless of OMP_DYNAMIC, whose default value may be FALSE.

In general, you should set MKL_DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect, for example, when nested parallelism is desired where the library is called already from a parallel section. Please refer to "MKL_DYNAMIC" in the Intel MKL User's Guide for details.

MKL_DOMAIN_NUM_THREADS

MKL_DOMAIN_NUM_THREADS will allow user to suggest the number of threads for a particular function domain. The domain-specific settings take precedence over the overall ones. For example, the "MKL_BLAS=4" value of MKL_DOMAIN_NUM_THREADS suggests to try 4 threads for BLAS, regardless of later setting MKL_NUM_THREADS. Please refer to " MKL_DOMAIN_NUM_THREADS" in the Intel MKL User's Guide for details.

Note on FFT Usage

Introduction of additional threading control made it possible to optimize the commit stage of the FFT implementation and get rid of double data initialization. However, this optimization requires a change in the FFT usage. Suppose you create threads in the application yourself after initializing all FFT descriptors. In this case, threading is employed for the parallel FFT computation only, the descriptors are released upon return from the parallel region, and each descriptor is used only within the corresponding thread. Starting with Intel MKL 10.0, you must explicitly instruct the librar y before the commit stage to work on one thread. To do this, set MKL_NUM_THREADS=1 or MKL_DOMAIN_NUM_THREADS="MKL_FFT=1" or call the corresponding pair of service functions. Otherwise, the actual number of threads may be different because the DftiCommitDescriptor function is not in a parallel region. See Example C-27a "Using Parallel Mode with Multiple Descriptors Initialized in One Thread" in the Intel MKL Reference Manual.

The Intel® Math Kernel Library (Intel® MKL) contains functions that are more highly optimized for Intel microprocessors than for other microprocessors. While the functions in Intel® MKL offer optimizations for both Intel and Intel-compatible microprocessors, depending on your code and other factors, you will likely get extra performance on Intel microprocessors.

While the paragraph above describes the basic optimization approach for Intel® MKL as a whole, the library may or may not be optimized to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

Intel recommends that you evaluate other library products to determine which best meets your requirements.