Parallelization with OpenMP* Overview

The Intel® compiler supports the OpenMP* version 2.5 API specification and an automatic parallelization capability. OpenMP provides symmetric multiprocessing (SMP) with the following major features:

Note

For information on HT Technology, refer to the IA-32 Intel® Architecture Optimization Reference Manual (http://developer.intel.com/design/pentium4/manuals/index_new.htm).

The compiler performs transformations to generate multithreaded code based on the user's placement of OpenMP directives in the source program making it easy to add threading to existing software. The Intel compiler supports all of the current industry-standard OpenMP directives, except WORKSHARE, and compiles parallel programs annotated with OpenMP directives.

 Note

As with many advanced features of compilers, you must properly understand the functionality of the OpenMP directives in order to use them effectively and avoid unwanted program behavior. See parallelization options summary for all of the options of the OpenMP feature in the Intel C++ Compiler.

In addition, the compiler provides Intel-specific extensions to the OpenMP C/C++ version 2.5 specification including run-time library routines and environment variables.

For complete information on the OpenMP standard, visit the OpenMP* (http://www.openmp.org) web site. For complete C++ language specifications, see the OpenMP C/C++ version 2.5 specifications (http://www.openmp.org/specs).

Parallel Processing with OpenMP

To compile with OpenMP, you need to prepare your program by annotating the code with OpenMP directives. The Intel compiler first processes the application and produces a multithreaded version of the code which is then compiled. The output is an executable with the parallelism implemented by threads that execute parallel regions or constructs.

Windows* Considerations

The OpenMP specification does not define interoperability of multiple implementations; therefore, the OpenMP implementation supported by other compilers and OpenMP support in Intel compilers for Windows might not be interoperable. To avoid possible linking or run-time problems, keep the following guidelines in mind:

Performance Analysis

For performance analysis of your program, you can use the Intel® VTune™ Performance Analyzer and/or the Intel® Threading Tools to show performance information. You can obtain detailed information about which portions of the code that require the largest amount of time to execute and where parallel performance problems are located.

Targeting a Processor Run-time Check

While parallelizing a loop, the Intel compiler's loop parallelizer, OpenMP, tries to determine the optimal set of configurations for a given processor. At run time, a check is performed to determine which processor OpenMP should optimize a given loop. See detailed information in Processor-specific Runtime Checks for IA-32 Systems.