High-level optimizations exploit the properties of source code constructs (for example, loops and arrays) in the applications developed in high-level programming languages, such as Fortran and C++. The high-level optimizations include loop interchange, loop fusion, loop unrolling, loop distribution, unroll-and-jam, blocking, data prefetch, scalar replacement, data layout optimizations and loop unrolling techniques.
The option that turns on the high-level optimizations is -O3. The scope of optimizations turned on by -O3 is different for IA-32 and Itanium®-based applications. See Setting Optimization Levels.
The -O3 option enables the -O2 option and adds more aggressive optimizations; for example, loop transformation and prefetching. -O3 optimizes for maximum speed, but may not improve performance for some programs.
In conjunction with the vectorization options, -ax{K|W|N|B|P} and -x{K|W|N|B|P}, the -O3 option causes the compiler to perform more aggressive data dependency analysis than for default -O2. This may result in longer compilation times.
The -ivdep_parallel option asserts there is no loop-carried dependency in the loop where IVDEP directive is specified. This is useful for sparse matrix applications.
Follow these steps to tune applications on Itanium-based systems:
Compile your program with -O3 and -ipo. Use profile guided optimization whenever possible.
Identify hot spots in your code.
Turn on Optimization reporting.
Check why loops are not software pipelined.
Use CDEC$ ivdep to tell the compiler there is no dependency. You may also need the option -ivdep_parallel to indicate there is no loop carried dependency.
Use CDEC$ swp to enable software pipelining (useful for lop-sided control and unknown loop count).
Use CDEC$ loop count(n) when needed.
If cray pointers are used, use -safe_cray_ptr to indicate there is no aliasing.
Use CDEC$ distribute point to split large loops (normally, this is automatically done).
Check that the prefetch distance is correct. Use CDEC$ prefetch to override the distance when it is needed.