Optimization Options

The optimization options let you specify how to optimize your applications for speed, particular processors, code size, and so forth.

For more information about optimization, see "Compiler Optimizations Overview" and related sections in the Intel® Fortran User's Guide for Linux Volume II: Optimizing Applications.

See also Floating-Point Options.

Descriptions of Optimization Options

-arch keyword  (IA-32 systems only)

Default: -arch pn4

Determines the version of the architecture for which the compiler generates instructions.

The fol­lowing are -arch options:

-assume [no]buffered_io

Default: -assume nobuffered_io (buffer is flushed as each record is written)

Specifies whether records are written (flushed) to disk as each is written or are accumulated in the buffer. If you specify -assume buffered_io, records accumulate in the buffer.

For disk devices, -assume buffered_io (or the equivalent OPEN statement BUFFERED='YES' specifier or the FORT_BUFFERED run-time environment variable) requests that the internal buffer will be filled, possibly by many record output statements (WRITE), before it is written to disk by the Fortran run-time system. If a file is opened for direct access, I/O buffering will be ignored.

Using buffered writes usually makes disk I/O more efficient by writing larger blocks of data to the disk less often. However, if you request buffered writes, records not yet written to disk may be lost in the event of a system failure.

Unless you set the FORT_BUFFERED environment variable to true, the default is BUFFERED='NO' and -assume nobuffered_io for all I/O, in which case the Fortran run-time system empties its internal buffer for each WRITE (or similar record output statements).

The OPEN statement BUFFERED specifier applies to a specific logical unit. In contrast, the -assume [no]buffered_io option and the FORT_BUFFERED environment variable apply to all Fortran units.

-auto_ilp32 (Itanium-based and Intel® EM64T systems only)

Default: Off

Allows the compiler to use 32-bit pointers whenever possible as long as the application does not exceed a 32-bit address space.

Because this optimization requires interprocedural analysis over the whole program, you must use this option with the -ipo option.

Using this option on programs that exceed 32-bit address space may cause unpredictable results during program execution.

On Intel® EM64T systems, -auto_ilp32 has no effect unless -xP or -axP is also specified.

-ax{K|W|N|B|P} (IA-32 and Intel® EM64T systems only)

Default: None.

Directs the compiler to find opportunities to generate separate versions of functions that take advantage of features that are specific to the specified Intel® processor.

If the compiler finds such an opportunity, it first checks whether generating a processor-specific version of a function is likely to result in a performance gain. If this is the case, the compiler generates both a processor-specific version of a function and a generic version of the function. The generic version will run on any IA-32 processor.

At run time, one of the versions is chosen to execute, depending on the Intel processor in use. In this way, the program can benefit from performance gains on more advanced Intel processors, while still working properly on older IA-32 processors.

Possible values and the processors the code is optimized for are:

On Intel® EM64T systems, -axW and -axP are the only valid options.

-complex_limited_range[-]

Default: Off (-complex_limited_range-)

Enables the use of basic algebraic expansions of some arithmetic operations involving data of type COMPLEX. This can result in performance improvements in programs that use a lot of COMPLEX arithmetic. However, values at the extremes of the exponent range might not compute correctly.

-f[no-]alias

Default: -falias

Specifies that aliasing should be assumed in the program.

See also -f[no-]fnalias.

-f[no-]fnalias

Default: -ffnalias

Specifies that aliasing should be assumed within functions. The -fno-fnalias option specifies that aliasiing should not be assumed within functions, but should be assumed across calls.

See also -f[no-]alias.

-fast

Default: Off

Provides a shortcut method to enable several optimizations for run-time performance.

On Itanium®-based systems, the -fast option sets the following options to improve performance:

On IA-32 and Intel® EM64T systems, -fast also sets -xP. Therefore,  -fast sets -O3 -ipo -static -xP. Setting -xP on IA-32 and Intel® EM64T systems causes the compiler to detect non-compatible processors and generate an error message during execution.

To get the best possible performance, you might need to use the option in conjunction with an architecture-specific option such as -xN.

To override one of the options set by -fast, specify that option after the -fast option on the command line.

Note

The several options set by the -fast option may change from release to release.

-fnsplit[-] (Itanium-based systems only)

Default: On if -prof_use is specified; Off otherwise.

Enables function splitting if -prof_use is also enabled. (This option has no effect if -prof_use is not enabled.)

This option is automatically enabled if you use -prof_use.

To turn off function splitting, use -fnsplit-. (However, function grouping will continue to be enabled.)

See also these topics in Volume II:
Basic PGO Options
Example of Profile-Guided Optimization

-fp (IA-32 and Intel® EM64T systems only)

Default: On

Disables the use of ebp as a general-purpose register.

Most debuggers expect ebp to be used as a stack frame pointer, and cannot produce a stack backtrace unless this is so. This option allows frame pointers and disables the use of the ebp register in optimizations and lets the debugger produce a stack backtrace.

-gp

Default: Off

Alternate syntax: -p

Compile and link for function profiling with the gprof tool.

-ip

Default: Off

Enables single-file interprocedural optimizations.

Enhances inline function expansion.

See also this topic in Volume II: "Using -ip with -Qoption Specifiers."

-ip_no_inlining

Default: Off

Disables interprocedural inlining that results from the -ip or -ipo interprocedural optimizations, but has no effect on other interprocedural optimizations. Requires -ip or -ipo.

-ip_no_pinlining  

Default: Off

Disables partial inlining. Requires -ip or -ipo[n].

-ipo[n]

Default: Off

Enables multifile interprocedural optimizations, or multifile IPO. When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.

Optionally, you can specify an n value (an integer greater than or equal to 0), which indicates the number of object files that the compiler should create.

If n is equal to 0, the compiler decides whether to create one or more object files based on an estimate of the size of the object file.  It generates one object file for small applications and two or more object files for large applications.

The default value for n is 1 (generate a single object file).

See Also

See also these topics in Volume II:
IPO Compilation Model
Creating a Multifile IPO Executable with xilink
Using -ip with -Qoption Specifiers

-ipo_c

Default: Off

Optimizes across files and produces a multifile object file. Stops prior to the final link stage, leaving an optimized object file.

See also this topic in Volume II: "Capturing Intermediate Outputs of IPO."

-ipo_obj

Default: Off

Forces the generation of real object files. Requires -ipo[n]. Specifying -ipo_obj -ipo2 creates ipo_obj.o and ipo_obj1.o. ). See also this topic in Volume II: "Compilation with Real Object Files."

-ipo_S

Default: Off

Optimizes across files and produces multifile assembly files. Performs the same optimizations as -ipo[n], but stops prior to the final link stage, leaving an optimized assembly file. The default listing name is ipo_out.s.

See also this topic in Volume II: "Capturing Intermediate Outputs of IPO."

-ipo_separate

Default: Off

Creates one object file per source file. This option overrides any value that was set with -ipo[n].  

-ivdep_parallel (Itanium®-based systems only)

Default: Off

Specifies that there is no loop-carried memory dependency in the loop where an IVDEP directive is specified. This technique is useful for some sparse matrix applications.

See also this topic in Volume II: "Memory Dependency with the IVDEP Directive."

-nolib_inline

Default: On

Disables inline expansion of intrinsic functions.

-On

Default: -O2 unless you specify -debug, in which case the default is -O0

Specifies the code optimization for application types. Possible values are:

On IA-32 systems, -O1, -O2, and -O are equivalent.

On Itanium-based systems, -O2 and -O are equivalent.

Note

The last -On option specified on the command line takes precedence over any others.

-opt_report

Default: Off

Generates an optimization report to stderr.

See also this topic in Volume II: "Optimizer Report Generation."

-opt_report_file file

Default: Off

Generates an optimization report and specifies the file name for the report. You do not need to specify -opt_report if you use this option.

See also this topic in Volume II: "Optimizer Report Generation."

-opt_report_help

Default: Off

Displays the optimization phases available for reporting.

See also this topic in Volume II: "Optimizer Report Generation."

-opt_report_level {min|med|max}

Default: -opt_report_level min

Specifies the detail level of the optimization report.

See also this topic in Volume II: "Optimizer Report Generation."

-opt_report_phase phase

Default: Off

Specifies the optimization phase to generate the report for. Can be specified multiple times on the command line for multiple optimizations.

See also this topic in Volume II: "Optimizer Report Generation."

-opt_report_routine [routine]

Default: Off

Generates reports from all routines with names containing routine as part of their name.

If the optional routine is not specified, reports from all routines are generated.

See also this topic in Volume II: "Optimizer Report Generation."

-par_threshold[n]

Default: -par_threshold100

Sets a threshold for the auto-parallelization of loops based on the probability of profitable parallel execution. n can be from 0 through 100.

n=0: loops get auto-parallelized regardless of computation work volume, that is, always.

n=100: loops get auto-parallelized only if profitable parallel execution is almost certain.

See also these topics in Volume II:

Auto-Parallelization Threshold Control and Diagnostics
Auto-Parallelization Overview
Auto-Parallelization: Enabling, Options, Directives, and Environment Variables

-parallel

Default: Off

Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. To use this option, you must also specify -O2 or -O3.

See also these topics in Volume II:
Auto-Parallelization Overview
Auto-Parallelization: Enabling, Options, Directives, and Environment Variables

-prefetch[-] (IA-32 systems only)

Default: -prefetch (on)

Enables prefetch insertion optimization. The goal of prefetching is to reduce cache misses by providing hints to the processor about when data should be loaded into the cache. Note that -O3 must be specified for this option to work.

To disable the prefetch insertion optimization, use -prefetch-.

-prof_dir dir

Default: The directory where the program is compiled.

Specifies the directory in which you intend to place the profiling output files (.dyn and .dpi) to be created. The specified directory must already exist.

See also these topics in Volume II:
Advanced PGO Options
Specific Coding Guidelines for IA-32 Architecture

-prof_file file

Default: Source file name with extension .dyn and .dpi

Specifies the file name for the profiling summary file.

See also these topics in Volume II:
Advanced PGO Options
Specific Coding Guidelines for IA-32 Architecture

-prof_gen

Default: Off

Instruments a program for profiling to get the execution count of each basic block.

See also these topics in Volume II:
Basic PGO Options
Example of Profile-Guided Optimization

-prof_use

Default: Off

Enables use of profiling information (including function splitting and function grouping) during optimization. Instructs the compiler to produce a profile-optimized executable and merges available profiling output files into a pgopti.dpi file.

If you use this option, it automatically enables -fnsplit[-].

Note that there is no way to turn off function grouping if you enable it using this option.

See also these topics in Volume II:
Basic PGO Options
Example of Profile-Guided Optimization

-scalar_rep[-] (IA-32 systems only)

Default: -scalar_rep (on)

Enables scalar replacement performed during loop transformation. Requires -O3.

-tppn

Default value for IA-32 and Intel® EM64T systems: -tpp7

Default value for Itanium®-based systems: -tpp2

Optimizes for a particular Intel® processor. The executable will run on other processors, but is optimized for processors noted below. Possible choices for n are:

For Intel® EM64T systems, the only available option is -tpp7.

-unroll[n]

Default: -unroll (lets the compiler decide)

Specifies the maximum number of times to unroll a loop.

Possible values are:

-x{K|W|N|B|P} (IA-32 and Intel® EM64T systems only)

Default: None.

Lets you target your program to run on a specific Intel processor. The resulting code might contain unconditional use of features that are not supported on other processors.

Possible values and the processors the code is optimized for are:

On Intel® EM64T systems, -xW and -xP are the only valid options.

To execute the program on x86 processors not provided by Intel Corporation, do not specify this option.

Caution

If a program compiled with this option is executed on a processor that lacks the specified set of instructions, it can fail with an illegal instruction exception, or display other unexpected behavior. In particular, programs compiled with -xN, -xB, or -xP will emit run-time errors if they are executed on unsupported processors.