These subroutines use Winograd's variation of the Strassen's
algorithm to perform the matrix multiplication for both real and complex
matrices. SGEMMS and DGEMMS can perform any one of the following matrix
multiplications, using matrices A and B or their
transposes, and matrix C:
C<--AB | C<--ABT |
|
C<--ATB | C<--ATBT |
|
CGEMMS and ZGEMMS can perform any one of the following matrix
multiplications, using matrices A and B, their
transposes or their conjugate transposes, and matrix C:
C<--AB | C<--ABT | C<--ABH |
C<--ATB | C<--ATBT | C<--ATBH |
C<--AHB | C<--AHBT | C<--AHBH |
A, B, C | aux | Subroutine |
Short-precision real | Short-precision real | SGEMMS |
Long-precision real | Long-precision real | DGEMMS |
Short-precision complex | Short-precision real | CGEMMS |
Long-precision complex | Long-precision real | ZGEMMS |
Fortran | CALL SGEMMS | DGEMMS | CGEMMS | ZGEMMS (a, lda, transa, b, ldb, transb, c, ldc, l, m, n, aux, naux) |
C and C++ | sgemms | dgemms | cgemms | zgemms (a, lda, transa, b, ldb, transb, c, ldc, l, m, n, aux, naux); |
PL/I | CALL SGEMMS | DGEMMS | CGEMMS | ZGEMMS (a, lda, transa, b, ldb, transb, c, ldc, l, m, n, aux, naux); |
If transa = 'N', A is used in the computation, and A has l rows and m columns.
If transa = 'T', AT is used in the computation, and A has m rows and l columns.
If transa = 'C', AH is used in the computation, and A has m rows and l columns.
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:
If transa = 'N', its size must be lda by (at least) m.
If transa = 'T' or 'C', its size must be lda by (at least) l.
If transa = 'N', lda >= l.
If transa = 'T' or 'C', lda >= m.
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Specified as: a single character; transa = 'N' or 'T' for SGEMMS and DGEMMS; transa = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.
If transb = 'N', B is used in the computation, and B has m rows and n columns.
If transb = 'T', BT is used in the computation, and B has n rows and m columns.
If transb = 'C', BH is used in the computation, and B has n rows and m columns.
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:
If transb = 'N', its size must be ldb by (at least) n.
If transb = 'T' or 'C', its size must be ldb by (at least) m.
If transb = 'N', ldb >= m.
If transb = 'T' or 'C', ldb >= n.
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
If transb = 'C', BH is used in the computation.
Specified as: a single character; transb = 'N' or 'T' for SGEMMS and DGEMMS; transb = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.
If transa = 'N', it is the number of columns in matrix A.
If transa = 'T' or 'C', it is the number of rows in matrix A.
In addition:
If transb = 'N', it is the number of rows in matrix B.
If transb = 'T' or 'C', it is the number of columns in matrix B.
Specified as: a fullword integer; m >= 0.
If naux = 0 and error 2015 is unrecoverable, aux is ignored.
Otherwise, is the storage work area used by this subroutine. Its size is specified by naux.
Specified as: an area of storage containing numbers of the data type indicated in Table 75.
Specified as: a fullword integer, where:
If naux = 0 and error 2015 is unrecoverable, SGEMMS, DGEMMS, CGEMMS, and ZGEMMS dynamically allocate the work area used by the subroutine. The work area is deallocated before control is returned to the calling program.
Otherwise,
When this subroutine uses Strassen's algorithm:
When this subroutine uses the direct method (_GEMUL), use naux >= 0.
Notes:
In these instances when the direct method is used, the subroutine does not use auxiliary storage, and you can specify naux = 0.
The matrix multiplications performed by these subroutines are functionally equivalent to those performed by SGEMUL, DGEMUL, CGEMUL, and ZGEMUL. For details on the computations performed, see Function.
SGEMMS, DGEMMS, CGEMMS, and ZGEMMS use Winograd's variation of the Strassen's algorithm with minor changes for tuning purposes. (See pages 45 and 46 in reference [11].) The subroutines compute matrix multiplication for both real and complex matrices of large sizes. Complex matrix multiplication uses a special technique, using three real matrix multiplications and five real matrix additions. Each of these three resulting matrix multiplications then uses Strassen's algorithm.
The steps of Strassen's algorithm can be repeated up to four times by these subroutines, with each step reducing the dimensions of the matrix by a factor of two. The number of steps used by this subroutine depends on the size of the input matrices. Each step reduces the number of operations by about 10% from the normal matrix multiplication. On the other hand, if the matrix is small, a normal matrix multiplication is performed without using the Strassen's algorithm, and no improvement is gained. For details about small matrices, see Notes.
The complex multiplication is performed by forming the real and imaginary parts of the input matrices. These subroutines uses three real matrix multiplications and five real matrix additions, instead of the normal four real matrix multiplications and two real matrix additions. Using only three real matrix multiplications allows the subroutine to achieve up to a 25% reduction in matrix operations, which can result in a significant savings in computing time for large matrices.
Strassen's method is not stable for certain row or column scalings of the input matrices A and B. Therefore, for matrices A and B with divergent exponent values Strassen's method may give inaccurate results. For these cases, you should use the _GEMUL or _GEMM subroutines.
The equivalence rules, defined for matrix multiplication of A and B in Special Usage, also apply to these subroutines. You should use the equivalence rules when you want to transpose or conjugate transpose the result of the multiplication computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your output array, receiving CT or CH, has dimensions large enough to hold the resulting transposed or conjugate transposed matrix. See Example 2 and Example 4.
Error 2015 is unrecoverable, naux = 0, and unable to allocate work area.
None
This example shows the computation C<--AB, where A, B, and C are contained in larger arrays A, B, and C, respectively. It shows how to code the calling sequence for SGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in Example 1 for SGEMUL.
A LDA TRANSA B LDB TRANSB C LDC L M N AUX NAUX | | | | | | | | | | | | | CALL SGEMMS( A , 8 , 'N' , B , 6 , 'N' , C , 7 , 6 , 5 , 4 , AUX , 0 )
This example shows the computation C<--ABH, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained. The arrays contain complex data. This example shows how to code the calling sequence for CGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in Example 8 for CGEMUL.
A LDA TRANSA B LDB TRANSB C LDC L M N AUX NAUX | | | | | | | | | | | | | CALL CGEMMS( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 , AUX , 0 )