Prefetching

The goal of -prefetch insertion is to reduce cache misses by providing hints to the processor about when data should be loaded into the cache. The prefetching optimizations implement the following options:

-prefetch[-]

Enables or disables (-prefetch-) prefetch insertion. This option requires that -O3 be specified. The default with -O3 is -prefetch.

To facilitate compiler optimization:

 For more information on how to optimize with -prefetch[-], refer to the Intel® Pentium® 4 and Intel® Xeon(TM) Processor Optimization Reference Manual.

In addition to the -prefetch option, an intrinsic subroutine, MM_PREFETCH and compiler directive PREFETCH are also available. The subroutine MM_PREFETCH prefetches data from the specified address on one memory cache line. The compiler directive PREFETCH enables a data prefetch from memory.

The following example is for Itanium®-based systems only:

do j=1,lastrow-firstrow+1
  i = rowstr(j)
 iresidue = mod( rowstr(j+1)-i, 8 )
 sum = 0.d0
CDEC$ NOPREFETCH a,p,colidx
do k=i,i+iresidue-1
  sum = sum +  a(k)*p(colidx(k))
enddo
CDEC$ NOPREFETCH colidx
CDEC$ PREFETCH a:1:40
CDEC$ PREFETCH p:1:20
 do k=i+iresidue, rowstr(j+1)-8, 8
  sum = sum + a(k  )*p(colidx(k  ))
&      + a(k+1)*p(colidx(k+1)) + a(k+2)*p(colidx(k+2))
&      + a(k+3)*p(colidx(k+3)) + a(k+4)*p(colidx(k+4))
&      + a(k+5)*p(colidx(k+5)) + a(k+6)*p(colidx(k+6))
&      + a(k+7)*p(colidx(k+7))
  enddo
 q(j) = sum
enddo

For details, refer to the Intel® Fortran Language Reference.