Engineering and Scientific Subroutine Library for AIX Version 3 Release 3: Guide and Reference

Getting the Best Performance

This section describes how you can achieve the best possible performance from the ESSL subroutines.

What General Coding Techniques Can You Use to Improve Performance?

There are many ways in which you can improve the performance of your program. Here are some of them:

Use the basic linear algebra subprograms and matrix operations in the order of optimum performance: matrix-matrix computations, matrix-vector computations, and vector-scalar computations. When data is presented in matrices or vectors, rather than vectors or scalars, multiple operations can be performed by a single ESSL subroutine.
Where possible, use subroutines that do multiple computations, such as SNDOT and SNAXPY, rather than individual computations, such as SDOT and SAXPY.
Use a stride of 1 for the data in your computations. Not having vector elements consecutively accessed in storage can degrade your performance. The closer the vector elements are to each other in storage, the better your performance. For an explanation of stride, see How Stride Is Used for Vectors.
Do not specify the size of the leading dimension of an array (lda) or stride of a vector (inc) equal to or near a multiple of:
- 128 for a long-precision array
- 256 for a short-precision array
Do not specify the individual sizes of your one-dimensional arrays as multiples of 128. This is especially important when you are passing several one-dimensional arrays to an ESSL subroutine. (The multiplicity can cause a performance problem that otherwise might not occur.)
For small problems, avoid using a large leading dimension (lda) for your matrix.
In general, align your arrays on doubleword boundaries, regardless of the type of data; however, when running on a POWER2 processor, it is best to align your long-precision arrays on a quadword boundary. For information on how your programming language aligns data, see your programming language manuals.
One subroutine may do scaling while another does not. If scaling is not necessary for your data, you get better performance by using the subroutine without scaling. SNORM2 and DNORM2 are examples of subroutines that do not do scaling, versus SNRM2 and DNRM2, which do scaling.
Use the STRIDE subroutine to calculate the optimal stride values for your input or output data when using any of the Fourier transform subroutines, except _RCFT and _CRFT. Using these stride values for your data allows the Fourier transform subroutines to achieve maximum performance. You first obtain the optimal stride values from STRIDE, calling it once for each stride value desired. You then arrange your data using these stride values. After the data is set up, you call the Fourier transform subroutine. For details on the STRIDE subroutine and how to use it for each Fourier transform subroutine, see STRIDE--Determine the Stride Value for Optimal Performance in Specified Fourier Transform Subroutines. For additional information, see Setting Up Your Data.

Where Can You Find More Information on Performance?

Information about performance can be found in the following places:

Many of the techniques ESSL uses to achieve the best possible performance are described in the High Performance of ESSL.
Migration considerations concerning performance are described in Migrating ESSL Version 2 Programs to Version 3.
Specific information on performance for each area of ESSL is given in "Performance and Accuracy Considerations" in each chapter introduction in Part 2.
Detailed performance information for selected subroutines can be found in reference [30], [41], [42].

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]