The optimization options let you specify how to optimize your applications for speed, particular processors, code size, and so forth.
For more information about optimization, see "Compiler Optimizations Overview" and related sections in the Intel® Fortran User's Guide for Linux Volume II: Optimizing Applications.
See also Floating-Point Options.
Default: -arch pn4
Determines the version of the architecture for which the compiler generates instructions.
The following are -arch options:
-arch
pn1
Optimizes for the Intel®
Pentium®
processor.
-arch
pn2
Optimizes for the Intel®
Pentium®
Pro, Intel®
Pentium®
II, and Intel®
Pentium®
III processors.
-arch
pn3
Optimizes for the Intel®
Pentium®
Pro, Intel®
Pentium®
II, and Intel®
Pentium®
III processors. This is the same as
specifying
the /arch:pn2
option.
-arch
pn4
Optimizes for the Intel®
Pentium®
4 processor.
-arch
SSE
Optimizes for Intel®
Pentium®
4 processors with Streaming SIMD Extensions (SSE).
-arch
SSE2
Optimizes for Intel®
Pentium®
4 processors with Streaming SIMD Extensions 2 (SSE2).
Default: -assume nobuffered_io (buffer is flushed as each record is written)
Specifies whether records are written (flushed) to disk as each is written or are accumulated in the buffer. If you specify -assume buffered_io, records accumulate in the buffer.
For disk devices, -assume buffered_io (or the equivalent OPEN statement BUFFERED='YES' specifier or the FORT_BUFFERED run-time environment variable) requests that the internal buffer will be filled, possibly by many record output statements (WRITE), before it is written to disk by the Fortran run-time system. If a file is opened for direct access, I/O buffering will be ignored.
Using buffered writes usually makes disk I/O more efficient by writing larger blocks of data to the disk less often. However, if you request buffered writes, records not yet written to disk may be lost in the event of a system failure.
Unless you set the FORT_BUFFERED environment variable to true, the default is BUFFERED='NO' and -assume nobuffered_io for all I/O, in which case the Fortran run-time system empties its internal buffer for each WRITE (or similar record output statements).
The OPEN statement BUFFERED specifier applies to a specific logical unit. In contrast, the -assume [no]buffered_io option and the FORT_BUFFERED environment variable apply to all Fortran units.
Default: Off
Allows the compiler to use 32-bit pointers whenever possible as long as the application does not exceed a 32-bit address space.
Because this optimization requires interprocedural analysis over the whole program, you must use this option with the -ipo option.
Using this option on programs that exceed 32-bit address space may cause unpredictable results during program execution.
On Intel® EM64T systems, -auto_ilp32 has no effect unless -xP or -axP is also specified.
Default: None.
Directs the compiler to find opportunities to generate separate versions of functions that take advantage of features that are specific to the specified Intel® processor.
If the compiler finds such an opportunity, it first checks whether generating a processor-specific version of a function is likely to result in a performance gain. If this is the case, the compiler generates both a processor-specific version of a function and a generic version of the function. The generic version will run on any IA-32 processor.
At run time, one of the versions is chosen to execute, depending on the Intel processor in use. In this way, the program can benefit from performance gains on more advanced Intel processors, while still working properly on older IA-32 processors.
Possible values and the processors the code is optimized for are:
-axK Intel Pentium® III and compatible Intel processors
-axW Intel Pentium 4 and compatible Intel processors
-axN Intel Pentium 4 and compatible Intel processors. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
-axB Intel Pentium M and compatible Intel processors. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
-axP Intel® Pentium® 4 processors with Streaming SIMD Extensions 3 (SSE3) instruction support. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
On Intel® EM64T systems, -axW and -axP are the only valid options.
Default: Off (-complex_limited_range-)
Enables the use of basic algebraic expansions of some arithmetic operations involving data of type COMPLEX. This can result in performance improvements in programs that use a lot of COMPLEX arithmetic. However, values at the extremes of the exponent range might not compute correctly.
Default: -falias
Specifies that aliasing should be assumed in the program.
See also -f[no-]fnalias.
Default: -ffnalias
Specifies that aliasing should be assumed within functions. The -fno-fnalias option specifies that aliasiing should not be assumed within functions, but should be assumed across calls.
See also -f[no-]alias.
Default: Off
Provides a shortcut method to enable several optimizations for run-time performance.
On Itanium®-based systems, the -fast option sets the following options to improve performance:
-O3 (optimizes for maximum speed and high-level optimizations)
-ipo (enables interprocedural optimizations across files)
-static (prevents linking with shared libraries)
On IA-32 and Intel® EM64T systems, -fast also sets -xP. Therefore, -fast sets -O3 -ipo -static -xP. Setting -xP on IA-32 and Intel® EM64T systems causes the compiler to detect non-compatible processors and generate an error message during execution.
To get the best possible performance, you might need to use the option in conjunction with an architecture-specific option such as -xN.
To override one of the options set by -fast, specify that option after the -fast option on the command line.
Note
The several options set by the -fast option may change from release to release.
Default: On if -prof_use is specified; Off otherwise.
Enables function splitting if -prof_use is also enabled. (This option has no effect if -prof_use is not enabled.)
This option is automatically enabled if you use -prof_use.
To turn off function splitting, use -fnsplit-. (However, function grouping will continue to be enabled.)
See
also these topics in Volume II:
Basic PGO Options
Example of Profile-Guided Optimization
Default: On
Disables the use of ebp as a general-purpose register.
Most debuggers expect ebp to be used as a stack frame pointer, and cannot produce a stack backtrace unless this is so. This option allows frame pointers and disables the use of the ebp register in optimizations and lets the debugger produce a stack backtrace.
Default: Off
Alternate syntax: -p
Compile and link for function profiling with the gprof tool.
Default: Off
Enables single-file interprocedural optimizations.
Enhances inline function expansion.
See also this topic in Volume II: "Using -ip with -Qoption Specifiers."
Default: Off
Disables interprocedural inlining that results from the -ip or -ipo interprocedural optimizations, but has no effect on other interprocedural optimizations. Requires -ip or -ipo.
Default: Off
Disables partial inlining. Requires -ip or -ipo[n].
Default: Off
Enables multifile interprocedural optimizations, or multifile IPO. When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.
Optionally, you can specify an n value (an integer greater than or equal to 0), which indicates the number of object files that the compiler should create.
If n is equal to 0, the compiler decides whether to create one or more object files based on an estimate of the size of the object file. It generates one object file for small applications and two or more object files for large applications.
The default value for n is 1 (generate a single object file).
See Also
See also these topics in Volume II:
IPO Compilation Model
Creating a Multifile IPO Executable with xilink
Using -ip with -Qoption Specifiers
Default: Off
Optimizes across files and produces a multifile object file. Stops prior to the final link stage, leaving an optimized object file.
See also this topic in Volume II: "Capturing Intermediate Outputs of IPO."
Default: Off
Forces the generation of real object files. Requires -ipo[n]. Specifying -ipo_obj -ipo2 creates ipo_obj.o and ipo_obj1.o. ). See also this topic in Volume II: "Compilation with Real Object Files."
Default: Off
Optimizes across files and produces multifile assembly files. Performs the same optimizations as -ipo[n], but stops prior to the final link stage, leaving an optimized assembly file. The default listing name is ipo_out.s.
See also this topic in Volume II: "Capturing Intermediate Outputs of IPO."
Default: Off
Creates one object file per source file. This option overrides any value that was set with -ipo[n].
Default: Off
Specifies that there is no loop-carried memory dependency in the loop where an IVDEP directive is specified. This technique is useful for some sparse matrix applications.
See also this topic in Volume II: "Memory Dependency with the IVDEP Directive."
Default: On
Disables inline expansion of intrinsic functions.
Default: -O2 unless you specify -debug, in which case the default is -O0
Specifies the code optimization for application types. Possible values are:
-O0
Disables all optimizations.
This is the default if you specify -debug
(with no keyword).
Specifying this option causes certain -warn
options to be ignored.
-O1
Alternate syntax on IA-32 systems:
-O2 or -O
Maximize speed; disables some optimizations that increase code size
for a small speed benefit. This option enables global optimization. This
includes data-flow analysis, code motion, strength reduction and test
replacement, split-lifetime analysis, and instruction scheduling. Specifying
-O2 includes the
optimizations performed by -O1.
Note that, on IA-32 systems, -O1
and -O2
are equivalent.
-O2
Alternate syntax on Itanium-based systems:
-O
Minimizes size; optimizes for speed, but disables some optimizations
that increase code size for a small speed benefit; for the Itanium® compiler,
-O1 turns off software
pipelining to reduce code size. This option enables local optimizations
within the source program unit, recognition of common subexpressions,
and expansion of integer multiplication and division using shifts.
-O3
Maximize speed plus use higher-level optimizations; optimizations include
loop transformation, software pipelining, and (IA-32 only) prefetching;
this option may not improve performance for some programs. Specifying
-O3 includes the
optimizations performed by -O2.
This option enables additional global optimizations that improve speed
(at the cost of extra code size). These optimizations include:
o Loop
unrolling, including instruction scheduling
o Code
replication to eliminate branches
o Padding
the size of certain power-of-two arrays to allow more efficient cache
use. (See also this topic in Volume
II: "Using Arrays Efficiently.")
Setting -O3 sets
-fp.
On IA-32 systems, -O1, -O2, and -O are equivalent.
On Itanium-based systems, -O2 and -O are equivalent.
Note
The last -On option specified on the command line takes precedence over any others.
Default: Off
Generates an optimization report to stderr.
See also this topic in Volume II: "Optimizer Report Generation."
Default: Off
Generates an optimization report and specifies the file name for the report. You do not need to specify -opt_report if you use this option.
See also this topic in Volume II: "Optimizer Report Generation."
Default: Off
Displays the optimization phases available for reporting.
See also this topic in Volume II: "Optimizer Report Generation."
Default: -opt_report_level min
Specifies the detail level of the optimization report.
See also this topic in Volume II: "Optimizer Report Generation."
Default: Off
Specifies the optimization phase to generate the report for. Can be specified multiple times on the command line for multiple optimizations.
See also this topic in Volume II: "Optimizer Report Generation."
Default: Off
Generates reports from all routines with names containing routine as part of their name.
If the optional routine is not specified, reports from all routines are generated.
See also this topic in Volume II: "Optimizer Report Generation."
Default: -par_threshold100
Sets a threshold for the auto-parallelization of loops based on the probability of profitable parallel execution. n can be from 0 through 100.
n=0: loops get auto-parallelized regardless of computation work volume, that is, always.
n=100: loops get auto-parallelized only if profitable parallel execution is almost certain.
See also these topics in Volume II:
Auto-Parallelization Threshold Control and Diagnostics
Auto-Parallelization Overview
Auto-Parallelization: Enabling, Options, Directives, and Environment Variables
Default: Off
Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. To use this option, you must also specify -O2 or -O3.
See also these topics in Volume II:
Auto-Parallelization Overview
Auto-Parallelization: Enabling, Options, Directives, and Environment Variables
Default: -prefetch (on)
Enables prefetch insertion optimization. The goal of prefetching is to reduce cache misses by providing hints to the processor about when data should be loaded into the cache. Note that -O3 must be specified for this option to work.
To disable the prefetch insertion optimization, use -prefetch-.
Default: The directory where the program is compiled.
Specifies the directory in which you intend to place the profiling output files (.dyn and .dpi) to be created. The specified directory must already exist.
See also these topics in
Volume II:
Advanced PGO Options
Specific Coding Guidelines for IA-32 Architecture
Default: Source file name with extension .dyn and .dpi
Specifies the file name for the profiling summary file.
See also these topics in
Volume II:
Advanced PGO Options
Specific Coding Guidelines for IA-32 Architecture
Default: Off
Instruments a program for profiling to get the execution count of each basic block.
See
also these topics in Volume II:
Basic PGO Options
Example of Profile-Guided Optimization
Default: Off
Enables use of profiling information (including function splitting and function grouping) during optimization. Instructs the compiler to produce a profile-optimized executable and merges available profiling output files into a pgopti.dpi file.
If you use this option, it automatically enables -fnsplit[-].
Note that there is no way to turn off function grouping if you enable it using this option.
See also these topics in
Volume II:
Basic PGO Options
Example of Profile-Guided Optimization
Default: -scalar_rep (on)
Enables scalar replacement performed during loop transformation. Requires -O3.
Default value for IA-32 and Intel® EM64T systems: -tpp7
Default value for Itanium®-based systems: -tpp2
Optimizes for a particular Intel® processor. The executable will run on other processors, but is optimized for processors noted below. Possible choices for n are:
1 Optimize for Itanium processors (Itanium®-based systems only)
2 Optimize for Itanium 2 processors (Itanium®-based systems only)
5 Optimize for Intel Pentium® and Pentium® with MMX™ technology processors (IA-32 systems only)
6 Optimize for Intel Pentium® Pro, Pentium® II and Pentium® III processors (IA-32 systems only)
7 Optimize for Intel Pentium® 4, Intel® Xeon™, Intel Pentium® M processors, and Intel® Pentium® 4 processors with Streaming SIMD Extensions 3 (SSE3) instruction support (IA-32 systems only)
For Intel® EM64T systems, the only available option is -tpp7.
Default: -unroll (lets the compiler decide)
Specifies the maximum number of times to unroll a loop.
Possible values are:
-unroll Lets the compiler decide.
-unroll0 Disables loop unrolling. (Note: This is the only value allowed on Itanium-based systems; all other values are ignored.)
-unrolln Sets n as the maximum number of times a loop can be unrolled.
Default: None.
Lets you target your program to run on a specific Intel processor. The resulting code might contain unconditional use of features that are not supported on other processors.
Possible values and the processors the code is optimized for are:
-xK Intel Pentium III and compatible Intel processors
-xW Intel Pentium 4 and compatible Intel processors
-xN Intel Pentium 4 and compatible Intel processors. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
-xB Intel Pentium M and compatible Intel processors. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
-xP Intel Pentium 4 processors with Streaming SIMD Extensions 3 (SSE3) instruction support. Programs compiled with this option will detect non-compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor-specific optimizations.
On Intel® EM64T systems, -xW and -xP are the only valid options.
To execute the program on x86 processors not provided by Intel Corporation, do not specify this option.
Caution
If a program compiled with this option is executed on a processor that lacks the specified set of instructions, it can fail with an illegal instruction exception, or display other unexpected behavior. In particular, programs compiled with -xN, -xB, or -xP will emit run-time errors if they are executed on unsupported processors.