Document Number: 310709-001US
Purpose
Compiler Support
Using Intel® MKL Parallelism
Memory Management
Performance
Configuration File
Obtaining Version Information
Custom Shared Object Builder
FFT and DFT Functions
FFTW Interface Support
GMP* Functions
Technical Support
Disclaimer and Legal Information
The Intel® Math Kernel Library (Intel® MKL) 8.1 for Linux* Technical User Notes describe the details of how to compile, link and run with Intel® MKL 8.1 for Linux*. It should be used in conjunction with the Intel® MKL 8.1 for Linux* Release Notes and Getting Started with the Intel® MKL 8.1 for Linux* document to reference how to use Intel® MKL 8.1 for Linux* in your application.
Intel supports Intel® MKL for use only with compilers identified in the release notes. However, the library has been successfully used with other compilers as well.
When using the cblas interface, the header file mkl.h
will simplify program development
since it specifies enumerated values as well as prototypes
for all the functions. The header determines if the program is being compiled
with a C++ compiler and, if it is, the included file will be correct for use
with C++ compilation.
Intel® MKL is threaded in a number of places: sparse solver, LAPACK
(*GETRF, *POTRF, *GBTRF, *GEQRF, *ORMQR, *STEQR, *BDSQR
routines), all Level 3 BLAS, Sparse BLAS matrix-vector and
matrix-matrix multiply routines for the compressed sparse row and diagonal formats, all DFTs
(except 1D transformations when DFTI_NUMBER_OF_TRANSFORMS=1
and sizes are not power-of-two), and all FFTs. The library uses OpenMP* threading software.
There are situations in which conflicts can exist in the execution environment that make the use of threads in Intel® MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate.
If the user threads the program using OpenMP* directives and uses the Intel compilers to compile the program, Intel® MKL and the user program will both use the same threading library. Intel® MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel® MKL can be aware that it is in a parallel region only if the threaded program and Intel® MKL are using the same threading library. If the user program is threaded by some other means, Intel® MKL may operate in multithreaded mode and the computations may be corrupted. Here are several cases with recommendations for the user:
OMP_NUM_THREADS=1
in the environment.
This is the default with Intel® MKL except sparse solver.
OMP_NUM_THREADS
in
the environment affects both the compiler's threading library and the
threading library with Intel® MKL. At this time, the safe approach is to set
MKL_SERIAL=YES
(or MKL_SERIAL=yes
) which forces Intel® MKL
to serial mode regardless of OMP_NUM_THREADS
value.
OMP_NUM_THREADS
should be set to 1.
Setting the number of threads: The OpenMP* software responds
to the environmental variable OMP_NUM_THREADS
. The
number of threads can be set in the shell the
program is running in. To change the number of threads, in the command
shell in which the program is going to run, enter:
export OMP_NUM_THREADS=
<number of threads to use>.
To force the library to serial mode, environment variable MKL_SERIAL
should be set to YES
.
It works regardless of OMP_NUM_THREADS
value. MKL_SERIAL
is not set by default.
If the variable OMP_NUM_THREADS
is not set, Intel® MKL
software will run on the number of threads equal to 1. We recommend always setting
OMP_NUM_THREADS
to the number of processors you wish to use in your application.
Note. Currently the default number of threads for sparse solver is the number of processors in system.
MKL_FreeBuffers()
. If another call is made to a library function that
needs a memory buffer, then the memory manager will again allocate the
buffers and they will again remain allocated until either the end of the program
or the program deallocates the memory.
This memory management software is turned on by default. To disable
it, set the environment variable MKL_DISABLE_FAST_MM
to any value,
which will cause memory to be allocated and freed from call to call.
Disabling this feature will negatively impact performance of routines
such as the level 3 BLAS, especially for small problem sizes.
Memory management has a restriction for the number of allocated buffers in each thread. Currently this number is 32. The maximum number of supported threads is 514. To avoid the default restriction, disable memory management.
To obtain the best performance with Intel® MKL, make sure the following conditions are met:
n*element_size
) of two-dimensional arrays are divisible by 16Note on the LAPACK packed routines performance:
The routines with the names that contain the letters HP, OP, PP, SP, TP, UP in the matrix type and storage position
(the second and third letters respectively) operate on the matrices in the packed format (see "LAPACK Routine Naming
Conventions" sections in the MKL manual). Their functionality is strictly equivalent to the functionality of the
unpacked routines with the names containing the letters HE, OR, PO, SY, TR, UN in the corresponding positions,
but the performance is significantly lower.
If the memory restriction is not too tight, use an unpacked routine for better performance. Note that in such a case you
need to allocate N2/2 more memory than the memory required by a respective packed routine, where N is the problem size (the number
of equations).
For example, solving a symmetric eigenproblem with an expert driver can be speeded up through using an unpacked
routine:
call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z, ldz, work, lwork, iwork, ifail,
info)
,
where a
is the dimension lda
-by-n
, which is at least
N2 elements, instead of
call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info)
,
where ap
is the dimension N*(N+1)/2.
There are additional conditions for the FFT functions:
On IA-32 based applications the addresses of the first elements of arrays and the leading dimension values,
in bytes (n*element_size
), of two-dimensional arrays should be divisible by cache line size
(32 bytes for Pentium® III processor, 64 bytes for Pentium® 4 processor, and
128 bytes for Intel® EM64T processor).
On Itanium®-based applications the sufficient conditions are as follows:
- for the C-style FFT, the distance L between arrays that represent real and imaginary parts is not divisible by 64. The best case is when L=k*64 + 16.
- leading dimension values, in bytes (n*element_size
), of two-dimensional arrays are not power-of-two.
MKL configuration file will provide the possibilities to customize several features of the MKL, namely:
The configuration file is mkl.cfg
file by default. The file contains several variables that can be changed.
Below is the example of the configuration file containing all possible variables with default values:
//
// Default values for mkl.cfg file
//
// SO names for IA-32
MKL_X87so = mkl_def.so
MKL_SSE1so = mkl_p3.so
MKL_SSE2so = mkl_p4.so
MKL_SSE3so = mkl_p4p.so
MKL_VML_X87so = mkl_vml_def.so
MKL_VML_SSE1so = mkl_vml_p3.so
MKL_VML_SSE2so = mkl_vml_p4.so
MKL_VML_SSE3so = mkl_vml_p4p.so
// SO names for Intel(R) EM64T
MKL_EM64TDEFso = mkl_def.so
MKL_EM64TSSE3so = mkl_p4n.so
MKL_VML_EM64TDEFso = mkl_vml_def.so
MKL_VML_EM64TSSE3so = mkl_vml_p4n.so
// SO names for Intel(R) Itanium(R) processor family
MKL_I2Pso = mkl_i2p.so
MKL_VML_I2Pso = mkl_vml_i2p.so
// DLL names for LAPACK libraries
MKL_LAPACK32so = mkl_lapack32.so
MKL_LAPACK64so = mkl_lapack64.so
// Serial or parallel mode
//
YES – single threaded
//
NO - multi threaded
//
OMP – control by OMP_NUM_THREADS
MKL_SERIAL = YES
// Input parameters check
//
ON – checkers are used (default)
//
OFF – checkers are not used
MKL_INPUT_CHECK = ON
When any MKL function is called first, Intel® MKL checks to see if the configuration file exists, and if so,
it operates with the specified variables.
The path to the configuration file is specified by environment variable MKL_CFG_FILE
.
If this variable is not defined, then first the current directory is searched through, and then the directories
specified in the PATH environment variable. If the MKL configuration file does not exist, the library operates
with default values of variables (standard names of libraries, checkers on, non-threaded operation mode).
If the variable is not specified in the configuration file, or specified incorrectly, the default value is used.
Below is an example of the configuration file that only redefines the library names:
// SO redefinition
MKL_X87so = matlab_x87.so
MKL_SSE1so = matlab_sse1.so
MKL_SSE2so = matlab_sse2.so
MKL_SSE3so = matlab_sse2.so
MKL_ITPso = matlab_ipt.so
MKL_I2Pso = matlab_i2p.so
Intel® MKL provides a facility by which you can obtain information about
the library (e.g., the version number). Two methods are provided for
extracting this information. First, you may extract a version string
using the function MKLGetVersionString
. Or, alternatively, you can use
the MKLGetVersion
function to obtain an
MKLVersion
structure that contains the version
information. Example programs for extracting this information are
provided in the examples/versionquery
directory. A makefile is also provided to automatically build the
examples and output summary files containing the version information
for the current library.
Custom shared object builder is targeted for dynamic library (shared object)
creation with selected functions and placed in tools/builder
folder.
The builder contains a makefile and a definition file with the list of functions.
The makefile has three targets: "ia32", "ipf", and "em64t". ia32 target is used for IA-32,
ipf is used for Intel® Itanium® processor family and em64t is used for Intel® Xeon® processor with
Intel® EM64T.
There are several macros (parameters) for the makefile:
functions_list
.mkl_custom.so
is built.xerbla
.
By default, that is, when this parameter is not pointed, standard MKL xerbla
is used.All parameters are not mandatory. For the simplest case, the command line could be make ia32
and
the values of the remaining parameters will be taken by default. As a result mkl_custom.so
library for IA-32 will be created,
the functions list will be taken from functions_list
file, and the standard MKL error handler
xerbla
will be used.
Another example for a more complex case:
make ia32 export=my_func_list.txt name=mkl_small xerbla=my_xerbla.o
In this case mkl_small.so
library for IA-32 will be created, the functions list will
be taken from my_func_list.txt
file, user’s error handler my_xerbla.o
will be used.
Entry points in functions_list
file should be adjusted to interface:
dgemm_
ddot_
dgetrf_
If selected functions have several processor specific versions they all will be included into the custom library and managed by dispatcher.
Intel® MKL Reference manual (mklman.pdf) chapter on Fourier transforms (Chapter 11) describes Discrete Fourier Transform functions and Fast Fourier Transforms functions. Only DFT functions should be used. The FFT functions are deprecated and continued only for legacy reasons. The newer DFT functions have broader functionality and higher performance than the older functions.
Intel MKL offers two collections of C routines (wrappers) that allow the FFTW interface to call the Intel
MKL discrete Fourier transform interface (DFTI). These collections correspond to the FFTW versions 2.x and 3.x,
respectively, and the Intel MKL versions 7.0 and later.
The purpose of these wrappers is to enable developers whose programs currently use FFTW to achieve the performance
of the Intel MKL Fourier transforms without changing the program source code.
See FFTW to Intel® MKL Wrappers Technical User Notes for FFTW 2.x (fftw2xmkl_notes.htm)
for details on the use of the FFTW 2.x wrappers and FFTW to Intel® MKL Wrappers
Technical User Notes for FFTW 3.x
(fftw3xmkl_notes.htm) for details on the use of the FFTW 3.x wrappers.
If you currently use the GMP* library, you need to modify INCLUDE
statements in your programs to mkl_gmp.h
.
The information in this manual is subject to change without notice and Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. The information in this document is provided in connection with Intel products and should not be construed as a commitment by Intel Corporation.
EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The software described in this document may contain software defects which may cause the product to deviate from published specifications. Current characterized software defects are available on request.
Intel, the Intel logo, Intel SpeedStep, Intel NetBurst, Intel NetStructure,
MMX, Intel386, Intel486, Celeron, Intel Centrino, Intel Xeon, Intel XScale, Itanium, Pentium, Pentium II Xeon,
Pentium III Xeon, Pentium M, and VTune are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2000-2006, Intel Corporation.