Optimization Options

The optimization options let you specify how to optimize your applications for speed, particular processors, code size, and so forth.

For more information about optimization, see "Compiler Optimizations Overview" and related sections in the Intel® Fortran User's Guide for Linux Volume II: Optimizing Applications.

Descriptions of Optimization Options

-arch keyword (IA-32 systems only)

Default: -arch pn4

Determines the version of the architecture for which the compiler generates instructions.

The following are -arch options:

-arch pn1
Optimizes for the Intel® Pentium® processor.
-arch pn2
Optimizes for the Intel® Pentium® Pro, Intel® Pentium® II, and Intel® Pentium® III processors.
-arch pn3
Optimizes for the Intel® Pentium® Pro, Intel® Pentium® II, and Intel® Pentium® III processors. This is the same as specifying the /arch:pn2 option.
-arch pn4
Optimizes for the Intel® Pentium® 4 processor.
-arch SSE
Optimizes for Intel® Pentium® 4 processors with Streaming SIMD Extensions (SSE).
-arch SSE2
Optimizes for Intel® Pentium® 4 processors with Streaming SIMD Extensions 2 (SSE2).

-assume [no]buffered_io

Default: -assume nobuffered_io (buffer is flushed as each record is written)

Specifies whether records are written (flushed) to disk as each is written or are accumulated in the buffer. If you specify -assume buffered_io, records accumulate in the buffer.

For disk devices, -assume buffered_io (or the equivalent OPEN statement BUFFERED='YES' specifier or the FORT_BUFFERED run-time environment variable) requests that the internal buffer will be filled, possibly by many record output statements (WRITE), before it is written to disk by the Fortran run-time system. If a file is opened for direct access, I/O buffering will be ignored.

Using buffered writes usually makes disk I/O more efficient by writing larger blocks of data to the disk less often. However, if you request buffered writes, records not yet written to disk may be lost in the event of a system failure.

Unless you set the FORT_BUFFERED environment variable to true, the default is BUFFERED='NO' and -assume nobuffered_io for all I/O, in which case the Fortran run-time system empties its internal buffer for each WRITE (or similar record output statements).

The OPEN statement BUFFERED specifier applies to a specific logical unit. In contrast, the -assume [no]buffered_io option and the FORT_BUFFERED environment variable apply to all Fortran units.

-auto_ilp32 (Itanium-based and Intel® EM64T systems only)

Default: Off

Allows the compiler to use 32-bit pointers whenever possible as long as the application does not exceed a 32-bit address space.

Because this optimization requires interprocedural analysis over the whole program, you must use this option with the -ipo option.

Using this option on programs that exceed 32-bit address space may cause unpredictable results during program execution.

On Intel® EM64T systems, -auto_ilp32 has no effect unless -xP or -axP is also specified.

-ax{K|W|N|B|P} (IA-32 and Intel® EM64T systems only)

Default: None.

Directs the compiler to find opportunities to generate separate versions of functions that take advantage of features that are specific to the specified Intel® processor.

If the compiler finds such an opportunity, it first checks whether generating a processor-specific version of a function is likely to result in a performance gain. If this is the case, the compiler generates both a processor-specific version of a function and a generic version of the function. The generic version will run on any IA-32 processor.

At run time, one of the versions is chosen to execute, depending on the Intel processor in use. In this way, the program can benefit from performance gains on more advanced Intel processors, while still working properly on older IA-32 processors.

Possible values and the processors the code is optimized for are:

-axK Intel Pentium® III and compatible Intel processors
-axW Intel Pentium 4 and compatible Intel processors
-axN Intel Pentium 4 and compatible Intel processors. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
-axB Intel Pentium M and compatible Intel processors. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
-axP Intel® Pentium® 4 processors with Streaming SIMD Extensions 3 (SSE3) instruction support. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.

On Intel® EM64T systems, -axW and -axP are the only valid options.

-complex_limited_range[-]

Default: Off (-complex_limited_range-)

Enables the use of basic algebraic expansions of some arithmetic operations involving data of type COMPLEX. This can result in performance improvements in programs that use a lot of COMPLEX arithmetic. However, values at the extremes of the exponent range might not compute correctly.

-f[no-]alias

Default: -falias

Specifies that aliasing should be assumed in the program.

-f[no-]fnalias

Default: -ffnalias

Specifies that aliasing should be assumed within functions. The -fno-fnalias option specifies that aliasiing should not be assumed within functions, but should be assumed across calls.

-fast

Default: Off

Provides a shortcut method to enable several optimizations for run-time performance.

On Itanium®-based systems, the -fast option sets the following options to improve performance:

-O3 (optimizes for maximum speed and high-level optimizations)
-ipo (enables interprocedural optimizations across files)
-static (prevents linking with shared libraries)

On IA-32 and Intel® EM64T systems, -fast also sets -xP. Therefore, -fast sets -O3 -ipo -static -xP. Setting -xP on IA-32 and Intel® EM64T systems causes the compiler to detect non-compatible processors and generate an error message during execution.

To get the best possible performance, you might need to use the option in conjunction with an architecture-specific option such as -xN.

To override one of the options set by -fast, specify that option after the -fast option on the command line.

Note

The several options set by the -fast option may change from release to release.

-fnsplit[-] (Itanium-based systems only)

Default: On if -prof_use is specified; Off otherwise.

Enables function splitting if -prof_use is also enabled. (This option has no effect if -prof_use is not enabled.)

This option is automatically enabled if you use -prof_use.

To turn off function splitting, use -fnsplit-. (However, function grouping will continue to be enabled.)

See also these topics in Volume II:
Basic PGO Options
Example of Profile-Guided Optimization

-fp (IA-32 and Intel® EM64T systems only)

Default: On

Disables the use of ebp as a general-purpose register.

Most debuggers expect ebp to be used as a stack frame pointer, and cannot produce a stack backtrace unless this is so. This option allows frame pointers and disables the use of the ebp register in optimizations and lets the debugger produce a stack backtrace.

-gp

Default: Off

Alternate syntax: -p

Compile and link for function profiling with the gprof tool.

-ip

Default: Off

Enables single-file interprocedural optimizations.

Enhances inline function expansion.

See also this topic in Volume II: "Using -ip with -Qoption Specifiers."

-ip_no_inlining

Default: Off

Disables interprocedural inlining that results from the -ip or -ipo interprocedural optimizations, but has no effect on other interprocedural optimizations. Requires -ip or -ipo.

-ip_no_pinlining

Default: Off

Disables partial inlining. Requires -ip or -ipo[n].

-ipo[n]

Default: Off

Enables multifile interprocedural optimizations, or multifile IPO. When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.

Optionally, you can specify an n value (an integer greater than or equal to 0), which indicates the number of object files that the compiler should create.

If n is equal to 0, the compiler decides whether to create one or more object files based on an estimate of the size of the object file. It generates one object file for small applications and two or more object files for large applications.

The default value for n is 1 (generate a single object file).

See Also

See also these topics in Volume II:
IPO Compilation Model
Creating a Multifile IPO Executable with xilink
Using -ip with -Qoption Specifiers

-ipo_c

Default: Off

Optimizes across files and produces a multifile object file. Stops prior to the final link stage, leaving an optimized object file.

See also this topic in Volume II: "Capturing Intermediate Outputs of IPO."

-ipo_obj

Default: Off

Forces the generation of real object files. Requires -ipo[n]. Specifying -ipo_obj -ipo2 creates ipo_obj.o and ipo_obj1.o. ). See also this topic in Volume II: "Compilation with Real Object Files."

-ipo_S

Default: Off

Optimizes across files and produces multifile assembly files. Performs the same optimizations as -ipo[n], but stops prior to the final link stage, leaving an optimized assembly file. The default listing name is ipo_out.s.

See also this topic in Volume II: "Capturing Intermediate Outputs of IPO."

-ipo_separate

Default: Off

Creates one object file per source file. This option overrides any value that was set with -ipo[n].

-ivdep_parallel (Itanium®-based systems only)

Default: Off

Specifies that there is no loop-carried memory dependency in the loop where an IVDEP directive is specified. This technique is useful for some sparse matrix applications.

See also this topic in Volume II: "Memory Dependency with the IVDEP Directive."

-nolib_inline

Default: On

Disables inline expansion of intrinsic functions.

-On

Default: -O2 unless you specify -debug, in which case the default is -O0

Specifies the code optimization for application types. Possible values are:

-O0
Disables all optimizations.
This is the default if you specify -debug (with no keyword).
Specifying this option causes certain -warn options to be ignored.
-O1
Alternate syntax on IA-32 systems: -O2 or -O
Maximize speed; disables some optimizations that increase code size for a small speed benefit. This option enables global optimization. This includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling. Specifying -O2 includes the optimizations performed by -O1.
Note that, on IA-32 systems, -O1 and -O2 are equivalent.
-O2
Alternate syntax on Itanium-based systems: -O
Minimizes size; optimizes for speed, but disables some optimizations that increase code size for a small speed benefit; for the Itanium® compiler, -O1 turns off software pipelining to reduce code size. This option enables local optimizations within the source program unit, recognition of common subexpressions, and expansion of integer multiplication and division using shifts.
-O3
Maximize speed plus use higher-level optimizations; optimizations include loop transformation, software pipelining, and (IA-32 only) prefetching; this option may not improve performance for some programs. Specifying -O3 includes the optimizations performed by -O2. This option enables additional global optimizations that improve speed (at the cost of extra code size). These optimizations include:
o   Loop unrolling, including instruction scheduling
o   Code replication to eliminate branches
o   Padding the size of certain power-of-two arrays to allow more efficient cache use. (See also this topic in Volume II: "Using Arrays Efficiently.")
Setting -O3 sets -fp.

On IA-32 systems, -O1, -O2, and -O are equivalent.

On Itanium-based systems, -O2 and -O are equivalent.

Note

The last -On option specified on the command line takes precedence over any others.