Engineering and Scientific Subroutine Library for AIX Version 3 Release 3: Guide and Reference

SGEMMS, DGEMMS, CGEMMS, and ZGEMMS--Matrix Multiplication for General Matrices, Their Transposes, or Conjugate Transposes Using Winograd's Variation of Strassen's Algorithm

These subroutines use Winograd's variation of the Strassen's algorithm to perform the matrix multiplication for both real and complex matrices. SGEMMS and DGEMMS can perform any one of the following matrix multiplications, using matrices A and B or their transposes, and matrix C:

C<--AB	C<--AB^T
C<--A^TB	C<--A^TB^T

CGEMMS and ZGEMMS can perform any one of the following matrix multiplications, using matrices A and B, their transposes or their conjugate transposes, and matrix C:

C<--AB	C<--AB^T	C<--AB^H
C<--A^TB	C<--A^TB^T	C<--A^TB^H
C<--A^HB	C<--A^HB^T	C<--A^HB^H

Table 75. Data Types

A, B, C	`aux`	Subroutine
Short-precision real	Short-precision real	SGEMMS
Long-precision real	Long-precision real	DGEMMS
Short-precision complex	Short-precision real	CGEMMS
Long-precision complex	Long-precision real	ZGEMMS

Syntax

Fortran	CALL SGEMMS \| DGEMMS \| CGEMMS \| ZGEMMS (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`, `aux`, `naux`)
C and C++	sgemms \| dgemms \| cgemms \| zgemms (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`, `aux`, `naux`);
PL/I	CALL SGEMMS \| DGEMMS \| CGEMMS \| ZGEMMS (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`, `aux`, `naux`);

On Entry

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has l rows and m columns.

If transa = 'T', A^T is used in the computation, and A has m rows and l columns.

If transa = 'C', A^H is used in the computation, and A has m rows and l columns.

Note:: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:

If transa = 'N', its size must be lda by (at least) m.

If transa = 'T' or 'C', its size must be lda by (at least) l.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If transa = 'N', lda >= l.

If transa = 'T' or 'C', lda >= m.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Specified as: a single character; transa = 'N' or 'T' for SGEMMS and DGEMMS; transa = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', B^T is used in the computation, and B has n rows and m columns.

If transb = 'C', B^H is used in the computation, and B has n rows and m columns.

Note:: No data should be moved to form B^T or B^H; that is, the matrix B should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T' or 'C', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If transb = 'N', ldb >= m.

If transb = 'T' or 'C', ldb >= n.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Specified as: a single character; transb = 'N' or 'T' for SGEMMS and DGEMMS; transb = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.

c

See On Return.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= l.

l

is the number of rows in matrix C. Specified as: a fullword integer; 0 <= l <= ldc.

m

has the following meaning, where:

If transa = 'N', it is the number of columns in matrix A.

If transa = 'T' or 'C', it is the number of rows in matrix A.

In addition:

If transb = 'N', it is the number of rows in matrix B.

If transb = 'T' or 'C', it is the number of columns in matrix B.

Specified as: a fullword integer; m >= 0.

n

is the number of columns in matrix C. Specified as: a fullword integer; n >= 0.

aux

has the following meaning:

If naux = 0 and error 2015 is unrecoverable, aux is ignored.

Otherwise, is the storage work area used by this subroutine. Its size is specified by naux.

Specified as: an area of storage containing numbers of the data type indicated in Table 75.

naux

is the size of the work area specified by aux--that is, the number of elements in aux.

Specified as: a fullword integer, where:

If naux = 0 and error 2015 is unrecoverable, SGEMMS, DGEMMS, CGEMMS, and ZGEMMS dynamically allocate the work area used by the subroutine. The work area is deallocated before control is returned to the calling program.

Otherwise,

When this subroutine uses Strassen's algorithm:

For SGEMMS and DGEMMS:

Use naux = max[(n)(l), 0.7m(l+n)].
For CGEMMS and ZGEMMS:

Use naux = max[(n)(l), 0.7m(l+n)]+nb1+nb2, where:

If l >= n, then nb1 >= (l)(n+20) and nb2 >= max[(n)(l), (m)(n+20)].
If l < n, then nb1 >= (m)(n+20) and nb2 >= max[(n)(l), (l)(m+20)].

When this subroutine uses the direct method (_GEMUL), use naux >= 0.

Notes:

In most cases, these formulas provide an overestimate.
For an explanation of when this subroutine uses the direct method versus Strassen's algorithm, see Notes.

On Return

c: is the l by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 75.

Notes

There are two instances when these subroutines use the direct method (_GEMUL), rather than using Strassen's algorithm:
- When either or both of the input matrices are small
- For CGEMMS and ZGEMMS, when input matrices A and B overlap
In these instances when the direct method is used, the subroutine does not use auxiliary storage, and you can specify naux = 0.
For CGEMMS and ZGEMMS, one of the input matrices, A or B, is rearranged during the computation and restored to its original form on return. Keep this in mind when diagnosing an abnormal termination.
All subroutines accept lowercase letters for the transa and transb arguments.
Matrix C must have no common elements with matrices A or B; otherwise, results are unpredictable. See Concepts.
You have the option of having the minimum required value for naux dynamically returned to your program. For details, see Using Auxiliary Storage in ESSL.

Function

The matrix multiplications performed by these subroutines are functionally equivalent to those performed by SGEMUL, DGEMUL, CGEMUL, and ZGEMUL. For details on the computations performed, see Function.

SGEMMS, DGEMMS, CGEMMS, and ZGEMMS use Winograd's variation of the Strassen's algorithm with minor changes for tuning purposes. (See pages 45 and 46 in reference [11].) The subroutines compute matrix multiplication for both real and complex matrices of large sizes. Complex matrix multiplication uses a special technique, using three real matrix multiplications and five real matrix additions. Each of these three resulting matrix multiplications then uses Strassen's algorithm.

Strassen's Algorithm

The steps of Strassen's algorithm can be repeated up to four times by these subroutines, with each step reducing the dimensions of the matrix by a factor of two. The number of steps used by this subroutine depends on the size of the input matrices. Each step reduces the number of operations by about 10% from the normal matrix multiplication. On the other hand, if the matrix is small, a normal matrix multiplication is performed without using the Strassen's algorithm, and no improvement is gained. For details about small matrices, see Notes.

Complex Matrix Multiplication

The complex multiplication is performed by forming the real and imaginary parts of the input matrices. These subroutines uses three real matrix multiplications and five real matrix additions, instead of the normal four real matrix multiplications and two real matrix additions. Using only three real matrix multiplications allows the subroutine to achieve up to a 25% reduction in matrix operations, which can result in a significant savings in computing time for large matrices.

Accuracy Considerations

Strassen's method is not stable for certain row or column scalings of the input matrices A and B. Therefore, for matrices A and B with divergent exponent values Strassen's method may give inaccurate results. For these cases, you should use the _GEMUL or _GEMM subroutines.

Special Usage

The equivalence rules, defined for matrix multiplication of A and B in Special Usage, also apply to these subroutines. You should use the equivalence rules when you want to transpose or conjugate transpose the result of the multiplication computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your output array, receiving C^T or C^H, has dimensions large enough to hold the resulting transposed or conjugate transposed matrix. See Example 2 and Example 4.

Error Conditions

Resource Errors

Error 2015 is unrecoverable, naux = 0, and unable to allocate work area.

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
l, m, n < 0
l > ldc
transa, transb <> 'N' or 'T' for SGEMMS and DGEMMS
transa, transb <> 'N', 'T', or 'C' for CGEMMS and ZGEMMS
transa = 'N' and l > lda
transa = 'T' or 'C' and m > lda
transb = 'N' and m > ldb
transb = 'T' or 'C' and n > ldb
Error 2015 is recoverable or naux<>0, and naux is too small--that is, less than the minimum required value. Return code 1 is returned if error 2015 is recoverable.

Example 1

This example shows the computation C<--AB, where A, B, and C are contained in larger arrays A, B, and C, respectively. It shows how to code the calling sequence for SGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in Example 1 for SGEMUL.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N   AUX  NAUX
             |   |    |     |   |    |     |   |   |   |   |    |    |
CALL SGEMMS( A , 8 , 'N'  , B , 6 , 'N'  , C , 7 , 6 , 5 , 4 , AUX , 0  )

Example 2

This example shows the computation C<--AB^H, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained. The arrays contain complex data. This example shows how to code the calling sequence for CGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in Example 8 for CGEMUL.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N   AUX  NAUX
             |   |    |     |   |    |     |   |   |   |   |    |    |
CALL CGEMMS( A , 4 , 'N'  , B , 3 , 'C'  , C , 4 , 3 , 2 , 3 , AUX , 0 )

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]