Engineering and Scientific Subroutine Library for AIX Version 3 Release 3: Guide and Reference
This section describes how you can achieve the best possible performance
from the ESSL subroutines.
There are many ways in which you can improve the performance of your
program. Here are some of them:
- Use the basic linear algebra subprograms and matrix operations in the
order of optimum performance: matrix-matrix computations, matrix-vector
computations, and vector-scalar computations. When data is presented in
matrices or vectors, rather than vectors or scalars, multiple operations can
be performed by a single ESSL subroutine.
- Where possible, use subroutines that do multiple computations, such as
SNDOT and SNAXPY, rather than individual computations, such as SDOT and
SAXPY.
- Use a stride of 1 for the data in your computations. Not having
vector elements consecutively accessed in storage can degrade your
performance. The closer the vector elements are to each other in
storage, the better your performance. For an explanation of stride, see
How Stride Is Used for Vectors.
- Do not specify the size of the leading dimension of an array
(lda) or stride of a vector (inc) equal to or near a
multiple of:
- 128 for a long-precision array
- 256 for a short-precision array
- Do not specify the individual sizes of your one-dimensional
arrays as multiples of 128. This is especially important when you are
passing several one-dimensional arrays to an ESSL subroutine. (The
multiplicity can cause a performance problem that otherwise might not
occur.)
- For small problems, avoid using a large leading dimension (lda)
for your matrix.
- In general, align your arrays on doubleword boundaries, regardless of the
type of data; however, when running on a POWER2 processor, it is best to
align your long-precision arrays on a quadword boundary. For
information on how your programming language aligns data, see your programming
language manuals.
- One subroutine may do scaling while another does not. If scaling is
not necessary for your data, you get better performance by using the
subroutine without scaling. SNORM2 and DNORM2 are examples of
subroutines that do not do scaling, versus SNRM2 and DNRM2, which do
scaling.
- Use the STRIDE subroutine to calculate the optimal stride values for your
input or output data when using any of the Fourier transform subroutines,
except _RCFT and _CRFT. Using these stride values for your data allows
the Fourier transform subroutines to achieve maximum performance. You
first obtain the optimal stride values from STRIDE, calling it once for each
stride value desired. You then arrange your data using these stride
values. After the data is set up, you call the Fourier transform
subroutine. For details on the STRIDE subroutine and how to use it for
each Fourier transform subroutine, see STRIDE--Determine the Stride Value for Optimal Performance in Specified Fourier Transform Subroutines. For additional information, see Setting Up Your Data.
Information about performance can be found in the following places:
- Many of the techniques ESSL uses to achieve the best possible performance
are described in the High Performance of ESSL.
- Migration considerations concerning performance are described in Migrating ESSL Version 2 Programs to Version 3.
- Specific information on performance for each area of ESSL is given in
"Performance and Accuracy Considerations" in each chapter introduction
in Part 2.
- Detailed performance information for selected subroutines can be found in
reference [30], [41], [42].
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]