Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDTRMM and PZTRMM--Triangular Matrix-Matrix Product

PDTRMM computes one of the following matrix-matrix products:

1. B<--alphaAB	3. B<--alphaBA
2. B<--alphaA^TB	4. B<--alphaBA^T

PZTRMM computes one of the following matrix-matrix products:

1. B<--alphaAB	3. B<--alphaBA	5. B<--alphaA^HB
2. B<--alphaA^TB	4. B<--alphaBA^T	6. B<--alphaBA^H

where, in the formulas above:

A represents the global triangular submatrix:

For side = 'L', it is A_{ia:ia+m-1,
ja:ja+m-1}.
For side = 'R', it is A_{ia:ia+n-1,
ja:ja+n-1}.

B represents the global general submatrix B_{ib:ib+m-1,
jb:jb+n-1}.

alpha is a scalar.

Note:: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [14] and [15].

Table 49. Data Types

alpha, A, B	Subprogram
Long-precision real	PDTRMM
Long-precision complex	PZTRMM

Syntax

Fortran	CALL PDTRMM \| PZTRMM (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`)
C and C++	pdtrmm \| pztrmm (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`);

On Entry

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B.

If side = 'R', A is to the right of B.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Scope: global

Specified as: a single character; transa = 'N', 'T', or 'C'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

m

is the number of rows in submatrix B, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix B, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 49.

a

is the local part of the global triangular matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+ numa-1 part of the global matrix, and:

If uplo = 'U', the leading numa × numa upper triangular part of the global triangular submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading numa × numa lower triangular part of the global triangular submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Note:: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 49. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 and `side` = 'L' or `n` = 0 and `side` = 'R': M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 and `side` = 'L' or `n` = 0 and `side` = 'R': N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:

`desc_b`	Name	Description	Limits	Scope
1	DTYPE_B	Descriptor type	DTYPE_B=1	Global
2	CTXT_B	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_B	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_B >= 0 Otherwise: M_B >= 1	Global
4	N_B	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_B >= 0 Otherwise: N_B >= 1	Global
5	MB_B	Row block size	MB_B >= 1	Global
6	NB_B	Column block size	NB_B >= 1	Global
7	RSRC_B	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_B < `p`	Global
8	CSRC_B	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_B < `q`	Global
9	LLD_B	The leading dimension of the local array	LLD_B >= max(1,LOCp(M_B))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

b

is the updated local part of the global matrix B, containing the results of the computation.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49.

Notes and Coding Rules

These subroutines accept lowercase letters for the side, uplo, transa, and diag arguments.
For PDTRMM, if you specify 'C' for transa, it is interpreted as though you specified 'T'.
The matrices must have no common elements; otherwise, results are unpredictable.
PDTRMM and PZTRMM assume certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see Determining the Number of Rows and Columns in Your Local Arrays and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see Coding Tips for Optimizing Parallel Performance.
The following values must be equal: CTXT_A = CTXT_B.
If A is not contained within a single block, that is, either of the following is true:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:

If side = 'L', numa = m
If side = 'R', numa = n

then:
- The global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
- The global triangular matrix A must be aligned on a block boundary, that is:
  
  ia-1 must be a multiple of MB_A.
  ja-1 must be a multiple of NB_A.
If side = 'L':
- If A is not contained within a single block, then:
  - The following block sizes must be equal: MB_B = NB_A.
  - The global matrix B must be aligned on a block row boundary; that is, ib-1 must be a multiple of MB_B.
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix B; that is, iarow = ibrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
- If A is contained within a single block, then B must be a block row matrix; that is, if p > 1:
  
  m+mod(ib-1, MB_B) <= MB_B
If side = 'R':
- If A is not contained within a single block, then:
  - The following block sizes must be equal: NB_B = MB_A
  - The global matrix B must be aligned on a block column boundary; that is, jb-1 must be a multiple of NB_B.
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrix B, that is, iacol = ibcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
- If A is contained within a single block, then B must be a block column matrix; that is, if q > 1:
  
  n+mod(jb-1, NB_B) <= NB_B

DTYPE_A is invalid.
DTYPE_B is invalid.

Stage 2:

CTXT_A is invalid.

Stage 3:

This subroutine was called from outside the process grid.

Stage 4:

side <> 'L' or 'R'
uplo <> 'U' or 'L'
transa <> 'N', 'T', or 'C'
diag <> 'N' or 'U'
m < 0
n < 0
M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
MB_A < 1
NB_A < 1
M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
MB_B < 1
NB_B < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
RSRC_B < 0 or RSRC_B >= p
CSRC_B < 0 or CSRC_B >= q
ia < 1
ja < 1
ib < 1
jb < 1
CTXT_A <> CTXT_B

Stage 5:

MB_A <> NB_A

If A is not contained within a single block, that is, either of the following is true:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:

If side = 'L', numa = m
If side = 'R', numa = n

and:
side = 'L' and MB_B <> NB_A
side = 'R' and NB_B <> MB_A

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

ia > M_A
ja > N_A
ia+numa-1 > M_A
ja+numa-1 > N_A
where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

ib > M_B
jb > N_B
ib+m-1 > M_B
jb+n-1 > N_B

If A is not contained in a single block:

mod(ia-1, MB_A) <> 0
mod(ja-1, NB_A) <> 0
side = 'L' and mod(ib-1, MB_B) <> 0
side = 'R' and mod(jb-1, NB_B) <> 0

Stage 6:

LLD_A < max(1, LOCp(M_A))
LLD_B < max(1, LOCp(M_B))

If side = 'L':
In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
If A is contained in a single block:

p > 1 and m+mod(ib-1, MB_B) > MB_B

If side = 'R':
In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
If A is contained in a single block:

q > 1 and n+mod(jb-1, NB_B) > NB_B

Example 1

This example computes B = alphaAB using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET(0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA  DIAG  M   N    ALPHA    A  IA  JA   DESC_A
               |     |      |      |    |   |      |      |   |   |     |
 CALL PDTRMM( 'L' , 'U'  , 'N'  , 'N' , 5 , 3 ,  1.0D0  , A , 1 , 1 , DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )

	Desc_A	Desc_B
DTYPE_	1	1
CTXT_	`icontxt`^(IOBG23)	`icontxt`^(IOBG23)
M_	5	5
N_	5	3
MB_	2	2
NB_	2	2
RSRC_	0	0
CSRC_	0	0
LLD_	See below^(EPSSL23)	See below^(EPSSL23)
Notes: `icontxt` is the output of the BLACS_GRIDINIT call. Each process should set the LLD_ as follows: LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW)) In this example, LLD_A = LLD_B = 3 on P₀₀ and P₀₁, and LLD_A = LLD_B = 2 on P₁₀ and P₁₁.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D        0             1          2
     *                                  *
 0   |  3.0 -1.0  |   2.0  2.0  |   1.0 |
     |   .  -2.0  |   4.0 -1.0  |   3.0 |
     | -----------|-------------|------ |
 1   |   .    .   |  -3.0  0.0  |   2.0 |
     |   .    .   |    .   4.0  |  -2.0 |
     | -----------|-------------|------ |
 2   |   .    .   |    .    .   |   1.0 |
     *                                  *

The following is the 2 × 2 process grid:

B,D  |  0 2  | 1 
-----|-------|-----
0    |   P₀₀   |  P₀₁
2    |       |
-----|-------|-----
1    |   P₁₀   |  P₁₁

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  3.0 -1.0  1.0  |   2.0  2.0
 0   |   .  -2.0  3.0  |   4.0 -1.0
     |   .    .   1.0  |    .    .
-----|-----------------|------------
 1   |   .    .   2.0  |  -3.0  0.0
     |   .    .  -2.0  |    .   4.0

Global rectangular 5 × 3 matrix B with block size 2 × 2:

B,D        0          1
     *                    *
 0   |  2.0  3.0  |   1.0 |
     |  5.0  5.0  |   4.0 |
     | -----------|------ |
 1   |  0.0  1.0  |   2.0 |
     |  3.0  1.0  |  -3.0 |
     | -----------|------ |
 2   | -1.0  2.0  |   1.0 |
     *                    *

The following is the 2 × 2 process grid:

B,D  |   0   | 1 
-----|-------|-----
0    |   P₀₀   |  P₀₁
2    |       |
-----|-------|-----
1    |   P₁₀   |  P₁₁

Local arrays for B:

p,q  |     0      |   1
-----|------------|-------
     |  2.0  3.0  |   1.0
 0   |  5.0  5.0  |   4.0
     | -1.0  2.0  |   1.0
-----|------------|-------
 1   |  0.0  1.0  |   2.0
     |  3.0  1.0  |  -3.0

Output:

Global rectangular 5 × 3 matrix B with block size 2 × 2:

B,D         0            1
     *                       *
 0   |   6.0  10.0  |   -2.0 |
     | -16.0  -1.0  |    6.0 |
     | -------------|------- |
 1   |  -2.0   1.0  |   -4.0 |
     |  14.0   0.0  |  -14.0 |
     | -------------|------- |
 2   |  -1.0   2.0  |    1.0 |
     *                       *

The following is the 2 × 2 process grid:

B,D  |   0   | 1 
-----|-------|-----
0    |   P₀₀   |  P₀₁
2    |       |
-----|-------|-----
1    |   P₁₀   |  P₁₁

Local arrays for B:

p,q  |      0       |    1
-----|--------------|--------
     |   6.0  10.0  |   -2.0
 0   | -16.0  -1.0  |    6.0
     |  -1.0   2.0  |    1.0
-----|--------------|--------
 1   |  -2.0   1.0  |   -4.0
     |  14.0   0.0  |  -14.0

Example 2

This example computes B = alphaAB using a 2 × 2 process grid.

Call Statements and Input

  ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET(0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA  DIAG  M   N    ALPHA    A  IA  JA   DESC_A
               |     |      |      |    |   |      |      |   |   |     |
 CALL PZTRMM( 'L' , 'U'  , 'C'  , 'N' , 5 , 1 ,  ALPHA  , A , 1 , 1 , DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )
 
              ALPHA = (1.0, 0.0)

	Desc_A	Desc_B
DTYPE_	1	1
CTXT_	`icontxt`^(IOBG35)	`icontxt`^(IOBG35)
M_	5	5
N_	5	1
MB_	2	2
NB_	2	2
RSRC_	0	0
CSRC_	0	0
LLD_	See below^(EPSSL35)	See below^(EPSSL35)
Notes: `icontxt` is the output of the BLACS_GRIDINIT call. Each process should set the LLD_ as follows: LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW)) In this example, LLD_A = LLD_B = 3 on P₀₀ and P₀₁, and LLD_A = LLD_B = 2 on P₁₀ and P₁₁.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D               0                         1                   2
     *                                                                 *
 0   | (-4.0, 1.0) ( 4.0,-3.0) | (-1.0, 3.0) ( 0.0, 0.0) | (-1.0, 0.0) |
     |      .      (-2.0, 0.0) | (-3.0,-1.0) (-2.0,-1.0) | ( 4.0, 3.0) |
     | ------------------------|-------------------------|------------ |
 1   |      .           .      | (-5.0, 3.0) (-3.0,-3.0) | (-5.0,-5.0) |
     |      .           .      |      .      ( 4.0,-4.0) | ( 2.0, 0.0) |
     | ------------------------|-------------------------|------------ |
 2   |      .           .      |      .           .      | ( 2.0,-1.0) |
     *                                                                 *

The following is the 2 × 2 process grid:

B,D  |  0 2  | 1 
-----|-------|-----
0    |   P₀₀   |  P₀₁
2    |       |
-----|-------|-----
1    |   P₁₀   |  P₁₁

Local arrays for A:

p,q  |                  0                  |            1
-----|-------------------------------------|-------------------------
     | (-4.0, 1.0) ( 4.0,-3.0) (-1.0, 0.0) | (-1.0, 3.0) ( 0.0, 0.0)
 0   |      .      (-2.0, 0.0) ( 4.0, 3.0) | (-3.0,-1.0) (-2.0,-1.0)
     |      .           .      ( 2.0,-1.0) |      .           .
-----|-------------------------------------|-------------------------
 1   |      .           .      (-5.0,-5.0) | (-5.0, 3.0) (-3.0,-3.0)
     |      .           .      ( 2.0, 0.0) |      .      ( 4.0,-4.0)

Global rectangular 5 × 1 matrix B with block size 2 × 2:

B,D         0        
     *             *
 0   | ( 3.0, 4.0) |
     | (-4.0, 2.0) | 
     | ----------- |
 1   | (-5.0, 0.0) |
     | ( 1.0, 3.0) |
     | ----------- |
 2   | ( 3.0, 1.0) |
     *             *

The following is the 2 × 2 process grid:

B,D  |   0   |-- 
-----|-------|-----
0    |   P₀₀   |  P₀₁
2    |       |
-----|-------|-----
1    |   P₁₀   |  P₁₁

Local arrays for B:

p,q  |      0      |      1
-----|-------------|------------
     | ( 3.0, 4.0) |      .
 0   | (-4.0, 2.0) |      .
     | ( 3.0, 1.0) |      . 
-----|-------------|------------
 1   | (-5.0, 0.0) |      .
     | ( 1.0, 3.0) |      .

Output:

Global rectangular 5 × 1 matrix B with block size 2 × 2:

B,D         0             
     *              *
 0   | (-8.0,-19.0) |         
     | ( 8.0, 21.0) |         
     | -------------|         
 1   | (44.0, -8.0) | 
     | (13.0, -7.0) |         
     | -------------|
 2   | (19.0,  2.0) |         
     *              *

The following is the 2 × 2 process grid:

B,D  |   0   |-- 
-----|-------|-----
0    |   P₀₀   |  P₀₁
2    |       |
-----|-------|-----
1    |   P₁₀   |  P₁₁

Local arrays for B:

p,q  |      0       |      1
-----|--------------|------------
     | (-8.0,-19.0) |      .
 0   | ( 8.0, 21.0) |      .
     | (19.0,  2.0) |      . 
-----|--------------|------------
 1   | (44.0, -8.0) |      .
     | (13.0, -7.0) |      .

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDTRMM and PZTRMM--Triangular Matrix-Matrix Product

Syntax

On Entry

On Return

Notes and Coding Rules

Error Conditions

Computational Errors

Resource Errors

Input-Argument and Miscellaneous Errors

Example 1

Call Statements and Input

Example 2

Call Statements and Input