The Intel® compiler supports the OpenMP* version 2.5 API specification and an automatic parallelization capability. OpenMP provides symmetric multiprocessing (SMP) with the following major features:
Relieves the user from having to deal with the low-level details of iteration space partitioning, data sharing, and thread scheduling and synchronization.
Provides the benefit of the performance available from shared memory, multiprocessor and dual-core processor systems and on IA-32 processors with Hyper-Threading Technology (HT Technology).
For information on HT Technology, refer to the IA-32 Intel® Architecture Optimization Reference Manual (http://developer.intel.com/design/pentium4/manuals/index_new.htm).
The compiler performs transformations to generate multithreaded code based on the user's placement of OpenMP directives in the source program making it easy to add threading to existing software. The Intel compiler supports all of the current industry-standard OpenMP directives, except WORKSHARE, and compiles parallel programs annotated with OpenMP directives.
As with many advanced features of compilers, you must properly understand the functionality of the OpenMP directives in order to use them effectively and avoid unwanted program behavior. See parallelization options summary for all of the options of the OpenMP feature in the Intel C++ Compiler.
In addition, the compiler provides Intel-specific extensions to the OpenMP C/C++ version 2.5 specification including run-time library routines and environment variables.
For complete information on the OpenMP standard, visit the OpenMP* (http://www.openmp.org) web site. For complete C++ language specifications, see the OpenMP C/C++ version 2.5 specifications (http://www.openmp.org/specs).
To compile with OpenMP, you need to prepare your program by annotating the code with OpenMP directives. The Intel compiler first processes the application and produces a multithreaded version of the code which is then compiled. The output is an executable with the parallelism implemented by threads that execute parallel regions or constructs.
The OpenMP specification does not define interoperability of multiple implementations; therefore, the OpenMP implementation supported by other compilers and OpenMP support in Intel compilers for Windows might not be interoperable. To avoid possible linking or run-time problems, keep the following guidelines in mind:
Avoid using multiple copies of the OpenMP runtime libraries from different compilers.
Compile all the OpenMP sources with one compiler, or compile the parallel region and entire call tree beneath it using the same compiler.
Use dynamic libraries for OpenMP.
For performance analysis of your program, you can use the Intel® VTune™ Performance Analyzer and/or the Intel® Threading Tools to show performance information. You can obtain detailed information about which portions of the code that require the largest amount of time to execute and where parallel performance problems are located.
While parallelizing a loop, the Intel compiler's loop parallelizer, OpenMP, tries to determine the optimal set of configurations for a given processor. At run time, a check is performed to determine which processor OpenMP should optimize a given loop. See detailed information in Processor-specific Runtime Checks for IA-32 Systems.