HLO Overview

High-level optimizations exploit the properties of source code constructs (for example, loops and arrays) in the applications developed in high-level programming languages, such as Fortran and C++. The high-level optimizations include loop interchange, loop fusion, loop unrolling, loop distribution, unroll-and-jam, blocking, data prefetch, scalar replacement, data layout optimizations and loop unrolling techniques.

The option that turns on the high-level optimizations is -O3.  The scope of optimizations turned on by -O3 is different for IA-32 and Itanium®-based applications. See Setting Optimization Levels.

IA-32 and Itanium®-based Applications

The -O3 option enables the -O2 option and adds more aggressive optimizations; for example, loop transformation and prefetching. -O3 optimizes for maximum speed, but may not improve performance for some programs.

IA-32 Applications

In conjunction with the vectorization options, -ax{K|W|N|B|P} and -x{K|W|N|B|P}, the -O3 option causes the compiler to perform more aggressive data dependency analysis than for default -O2. This may result in longer compilation times.

Itanium-based Applications

The -ivdep_parallel option asserts there is no loop-carried dependency in the loop where IVDEP directive is specified. This is useful for sparse matrix applications.

Key Techniques to Tune Your Itanium-based Applications

Follow these steps to tune applications on Itanium-based systems:

  1. Compile your program with -O3 and  -ipo.  Use profile guided optimization whenever possible.

  2. Identify hot spots in your code.  

  3. Turn on Optimization reporting.  

  4. Check why loops are not software pipelined.

  1. Check that the prefetch distance is correct. Use CDEC$ prefetch to override the distance when it is needed.