The goal of -prefetch insertion is to reduce cache misses by providing hints to the processor about when data should be loaded into the cache. The prefetching optimizations implement the following options:
-prefetch[-] |
Enables or disables (-prefetch-) prefetch insertion. This option requires that -O3 be specified. The default with -O3 is -prefetch. |
To facilitate compiler optimization:
Minimize use of global variables and pointers.
Minimize use of complex control flow.
Choose data types carefully and avoid type casting.
For more information on how to optimize with -prefetch[-], refer to the Intel® Pentium® 4 and Intel® Xeon(TM) Processor Optimization Reference Manual.
In addition to the -prefetch option, an intrinsic subroutine, MM_PREFETCH and compiler directive PREFETCH are also available. The subroutine MM_PREFETCH prefetches data from the specified address on one memory cache line. The compiler directive PREFETCH enables a data prefetch from memory.
The following example is for Itanium®-based systems only:
do j=1,lastrow-firstrow+1
i
= rowstr(j)
iresidue
= mod( rowstr(j+1)-i, 8 )
sum = 0.d0
CDEC$ NOPREFETCH a,p,colidx
do k=i,i+iresidue-1
sum
= sum + a(k)*p(colidx(k))
enddo
CDEC$ NOPREFETCH colidx
CDEC$ PREFETCH a:1:40
CDEC$ PREFETCH p:1:20
do k=i+iresidue,
rowstr(j+1)-8, 8
sum
= sum + a(k )*p(colidx(k
))
& +
a(k+1)*p(colidx(k+1)) + a(k+2)*p(colidx(k+2))
& +
a(k+3)*p(colidx(k+3)) + a(k+4)*p(colidx(k+4))
& +
a(k+5)*p(colidx(k+5)) + a(k+6)*p(colidx(k+6))
& +
a(k+7)*p(colidx(k+7))
enddo
q(j) =
sum
enddo
For details, refer to the Intel® Fortran Language Reference.