Your choice of which ESSL subroutine to use is based mainly on the functional needs of your program. However, you have a choice of several variations of many of the subroutines. In addition, there are instances where certain subroutines cannot be used. This section describes these variations and limitations. See the answers to each question below that applies to you.
ESSL provides two run-time libraries:
The number of threads you choose to use depends on the problem size, the specific subroutine being called, and the number of physical processors you are running on. To achieve optimal performance, experimentation is necessary; however, picking the number of threads equal to the number of online processors generally provides good performance in most cases. In a few cases, performance may increase if you choose the number of threads to be less than the number of online processors. |For more information about thread concepts, see AIX General Programming Concepts: Writing and Debugging |Programs.
The ESSL SERIAL Library and the ESSL SMP Library support both 32-bit environment and 64-bit environment applications. For details see Chapter 4, Coding Your Program and Chapter 5, Processing Your Program.
Table 21. Multithreaded ESSL SMP Subroutines
Subroutine Names |
---|
Vector-Scalar Linear Algebra Subprograms: SASUM, DASUM, SCASUM, DZASUM SAXPY, DAXPY, CAXPY, ZAXPY SCOPY, DCOPY, CCOPY, ZCOPY SDOT, DDOT, CDOTU, ZDOTU, CDOTC, ZDOTC SNDOT, DNDOT SNORM2, DNORM2, CNORM2, ZNORM2 SROT, DROT, CROT, ZROT, CSROT, ZDROT SSCAL, DSCAL, CSCAL, ZSCAL, CSSCAL, ZDSCAL SSWAP, DSWAP, CSWAP, ZSWAP SVEA, DVEA, CVEA, ZVEA SVES, DVES, CVES, ZVES SVEM, DVEM, CVEM, ZVEM SYAX, DYAX, CYAX, ZYAX, CSYAX, ZDYAX SZAXPY, DZAXPY, CZAXPY, ZZAXPY |
Matrix-Vector Linear Algebra Subprograms: SGEMV, DGEMV, CGEMV, ZGEMV SGER, DGER, CGERU, ZGERU, CGERC, ZGERC SSPMV, DSPMV, CHPMV, ZHPMV SSYMV, DSYMV, CHEMV, ZHEMV SSPR, DSPR, CHPR, ZHPR SSYR, DSYR, CHER, ZHER SSPR2, DSPR2, CHPR2, ZHPR2 SSYR2, DSYR2, CHER2, ZHER2 SGBMV¢, DGBMV¢ CGBMV¢, ZGBMV¢ SSBMV¢, DSBMV¢ CHBMV¢, ZHBMV¢ STRMV, DTRMV, CTRMV, ZTRMV STPMV, DTPMV, CTPMV, ZTPMV STBMV¢, DTBMV¢ CTBMV¢, ZTBMV¢ |
Matrix Operations: SGEADD, DGEADD, CGEADD, ZGEADD SGESUB, DGESUB, CGESUB, ZGESUB SGEMUL, DGEMUL, CGEMUL, ZGEMUL SGEMM, DGEMM, CGEMM, ZGEMM SSYMM, DSYMM, CSYMM, ZSYMM, CHEMM, ZHEMM STRMM, DTRMM, CTRMM, ZTRMM SSYRK, DSYRK, CSYRK, ZSYRK, CHERK, ZHERK SSYR2K, DSYR2K, CSYR2K, ZSYR2K, CHER2K, ZHER2K SGETMI, DGETMI, CGETMI, ZGETMI SGETMO, DGETMO, CGETMO, ZGETMO |
Dense Linear Algebraic Equations: SGEF, DGEF, CGEF, ZGEF SGETRF, DGETRF, CGETRF, ZGETRF SPPF, DPPF, DPOF, DPOTRF SPPFCD*, DPPFCD*, DPOFCD* SPPICD*, DPPICD*, DPOICD*, DPOTRI* STRSV, DTRSV, CTRSV, ZTRSV STPSV, DTPSV, CTPSV, ZTPSV STRSM, DTRSM, CTRSM, ZTRSM STRI, DTRI, STRTRI, DTRTRI |
Sparse Linear Algebraic Equations: DSRIS& |
Linear Least Squares: DGEQRF |
Fourier Transforms: SCFT, DCFT SRCFT, DRCFT SCRFT, DCRFT SCFT2, DCFT2 SRCFT2, DRCFT2 SCRFT2, DCRFT2 SCFT3, DCFT3 SRCFT3, DRCFT3 DCRFT3, DCRFT3 |
Convolution and Correlation: SCOND, SCORD SDCON, SDCOR, DDCON, DDCOR |
Many of the dense linear algebraic equations and eigensystem analysis
subroutines make one or more calls to the multithreaded versions of the
matrix-vector linear algebra and matrix operation subroutines shown in this
table. SCOSF, DCOSF, SSINF, and DSINF make one or more calls to the
multithreaded versions of the Fourier Transform subroutines shown in this
table. These subroutines benefit from the increased performance of the
multithreaded versions of the ESSL SMP subroutines.
Your performance may be improved by setting the Environment variables: export MALLOCMULTIHEAP=true export XLSMPOPTS="spins=0:yields=0". For additional information, see the AIX Performance Management Guide and the XLF Manuals. & DSRIS only uses multiple threads when IPARM(4) = 1 or 2. ¢ The Level 2 Banded BLAS use multiple threads only when the bandwidth is sufficiently large. * Multiple threads are used for the factor or inverse computation. |
The version of the ESSL subroutine you select should agree with the data you are using. ESSL provides a short- and long-precision version of most of its subroutines processing short- and long-precision data, respectively. In a few cases, it also provides an integer version processing integer data or returning just integer data. The subroutine names are distinguished by a one- or two-letter prefix based on the following letters:
The precision of your data affects the accuracy of your results. This is discussed in Getting the Best Accuracy. For a description of these data types, see How Do You Set Up Your Scalar Data?.
Some subroutines process specific data structures, such as sparse vectors and matrices or dense and banded matrices. In addition, these data structures can be stored using various storage techniques. You should select the proper subroutine on the basis of the type of data structure you have and the storage technique you want to use. If possible, you should use a storage technique that conserves storage and potentially improves performance. For more about storage techniques, see Setting Up Your Data.
ESSL provides variations among some of its subroutines. You should consider performance and accuracy when deciding which subroutine is the best to use. Study the "Function" section in each subroutine description. It helps you understand exactly what each subroutine does, and helps you determine which subroutine is best for you. For example, some subroutines perform multiple computations of a certain type. This might give you better performance than a subroutine that does each computation individually. In other cases, one subroutine may do scaling while another does not. If scaling is not necessary for your data, you get better performance by using the subroutine without scaling.