The -unroll[n] option is used in the following way:
-unrolln specifies the maximum number of times you want to unroll a loop. The following example unrolls a loop at most four times:
ifort -unroll4 a.f
To disable loop unrolling, specify n as 0. On IA-32 systems, specifying 0 also disables the vectorizer's unroller, except for the unrolling required to resolve cache line splits penalties. The following example disables loop unrolling:
ifort -unroll0 a.f
-unroll (n omitted) lets the compiler decide whether to perform unrolling or not. This is the default; the compiler uses default heuristics or defines n.
-unroll0 (n = 0) disables the unroller.
The Itanium® compiler currently uses only n = 0; any other value is NOP.
The benefits are:
Unrolling eliminates branches and some of the code.
Unrolling enables you to aggressively schedule (or pipeline) the loop to hide latencies if you have enough free registers to keep variables live.
The Intel®
Pentium®
4 or Intel®
Xeon(TM) processors can correctly predict the exit branch for an inner
loop that has 16 or fewer iterations, if that number of iterations is
predictable and there are no conditional branches in the loop. Therefore,
if the loop body size is not excessive, and the probable number of iterations
is known, unroll inner loops for:
- Pentium 4 or Intel Xeon processor, until they have a maximum of 16
iterations
- Pentium III or Pentium
II processors, until they have a maximum of 4 iterations
The potential cost: excessive unrolling, or unrolling of very large loops can lead to increased code size.
For more information on how to optimize with -unroll[n], refer to the Intel® Pentium® 4 and Intel® Xeon(TM) Processor Optimization Reference Manual.