IBM Books

Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDTRSM and PZTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

PDTRSM perform one of the following solves for a triangular system of equations with multiple right-hand sides, using scalar alpha, rectangular matrix B, and triangular matrix A or its transpose:

Solution Equation
1. B<--alpha(A-1)B AX = alphaB
2. B<--alpha(A-T)B ATX = alphaB
3. B<--alphaB(A-1) XA = alphaB
4. B<--alphaB(A-T) XAT = alphaB

PZTRSM performs one of the following solves for a triangular system of equations with multiple right-hand sides, using scalar alpha, rectangular matrix B, and triangular matrix A, its transpose, or its conjugate transpose:

Solution Equation
1. B<--alpha(A-1)B AX = alphaB
2. B<--alpha(A-T)B ATX = alphaB
3. B<--alphaB(A-1) XA = alphaB
4. B<--alphaB(A-T) XAT = alphaB
5. B<--alpha(A-H)B AHX = alphaB
6. B<--alphaB(A-H) XAH = alphaB

where, in the formulas above:

A represents the global triangular submatrix:
B represents the global general submatrix Bib:ib+m-1, jb:jb+n-1.
alpha is a scalar.

Notes:

  1. The term X used in the systems of equations listed above represents the output solution matrix. It is important to note that, in this subroutine, the solution matrix is actually returned in the input-output argument b.

  2. No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form.

If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [14] and [15].

Table 50. Data Types

alpha, A, B Subprogram
Long-precision real PDTRSM
Long-precision complex PZTRSM

Syntax

Fortran CALL PDTRSM | PZTRSM (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b)
C and C++ pdtrsm | pztrsm (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b);

On Entry

side
indicates whether A is located to the left or right of B in the system of equations, where:

If side = 'L', A is to the left of B.

If side = 'R', A is to the right of B.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo
indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa
indicates the form of matrix A used in the system of equations, where:

If transa = 'N', A is used.

If transa = 'T', AT is used.

If transa = 'C', AH is used.

Scope: global

Specified as: a single character; transa = 'N', 'T', or 'C'.

diag
indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

m
is the number of rows in submatrix B, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n
is the number of columns in submatrix B, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha
is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 50.

a
is the local part of the global triangular matrix A, used in the system of equations. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

Note:
No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia
is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja
is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a
is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix If side = 'L' and m = 0:
M_A >= 0
If side = 'R' and n = 0:
M_A >= 0
Otherwise:
M_A >= 1
Global
4 N_A Number of columns in the global matrix If side = 'L' and m = 0:
N_A >= 0
If side = 'R' and n = 0:
N_A >= 0
Otherwise:
N_A >= 1
Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b
is the local part of the global general matrix B, containing the right-hand sides of the triangular system to be solved. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib
is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb
is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b
is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix If side = 'L' and m = 0:
M_B >= 0
If side = 'R' and n = 0:
M_B >= 0
Otherwise:
M_B >= 1
Global
4 N_B Number of columns in the global matrix N_B >= 1 Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size If side = 'L' and m = 0:
N_B >= 0
If side = 'R' and n = 0:
N_B >= 0
Otherwise:
N_B >= 1
Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

b
is the updated local part of the global matrix B, containing the n solution vectors of length m.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50.

Notes and Coding Rules
  1. These subroutines accept lowercase letters for the side, uplo, transa, and diag arguments.
  2. For PDTRSM, if you specify 'C' for transa, it is interpreted as though you specified 'T'.
  3. The matrices must have no common elements; otherwise, results are unpredictable.
  4. PDTRSM and PZTRSM assume certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.
  5. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see Determining the Number of Rows and Columns in Your Local Arrays and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
  6. For suggested block sizes, see Coding Tips for Optimizing Parallel Performance.
  7. The following values must be equal: CTXT_A = CTXT_B.
  8. If A is not contained within a single block, that is, either of the following is true:
    numa+mod(ia-1, MB_A) > MB_A
    numa+mod(ja-1, NB_A) > NB_A
    where:
    If side = 'L', numa = m
    If side = 'R', numa = n

    then the global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.

  9. If A is not contained within a single block, then the global triangular matrix A must be aligned on a block boundary, that is:
    ia-1 must be a multiple of MB_A.
    ja-1 must be a multiple of NB_A.
  10. If side = 'L':
  11. If side = 'R':
  12. If A is contained within a single block, then:

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1 

  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.

Stage 2 

  1. CTXT_A is invalid.

Stage 3 

  1. This subroutine was called from outside the process grid.

Stage 4 

  1. side <> 'L' or 'R'
  2. uplo <> 'U' or 'L'
  3. transa <> 'N', 'T', or 'C'
  4. diag <> 'N' or 'U'
  5. m < 0
  6. n < 0
  7. M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
  8. N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
  9. MB_A < 1
  10. NB_A < 1
  11. M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
  12. N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
  13. MB_B < 1
  14. NB_B < 1
  15. RSRC_A < 0 or RSRC_A >= p
  16. CSRC_A < 0 or CSRC_A >= q
  17. RSRC_B < 0 or RSRC_B >= p
  18. CSRC_B < 0 or CSRC_B >= q
  19. ia < 1
  20. ja < 1
  21. ib < 1
  22. jb < 1
  23. CTXT_A <> CTXT_B

Stage 5  If A is not contained within a single block, that is, either of the following is true:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:
If side = 'L', numa = m
If side = 'R', numa = n

then:

  1. MB_A <> NB_A
  2. side = 'L' and MB_B <> NB_A
  3. side = 'R' and NB_B <> MB_A

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

  1. ia > M_A
  2. ja > N_A
  3. ia+numa-1 > M_A
  4. ja+numa-1 > N_A

    where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

  1. ib > M_B
  2. jb > N_B
  3. ib+m-1 > M_B
  4. jb+n-1 > N_B

If A is not contained in a single block:

  1. mod(ia-1, MB_A) <> 0
  2. mod(ja-1, NB_A) <> 0
  3. side = 'L' and mod(ib-1, MB_B) <> 0
  4. side = 'R' and mod(jb-1, NB_B) <> 0

Stage 6 

  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_B < max(1, LOCp(M_B))

If side = 'L':

  1. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  2. If A is contained in a single block:
    p > 1 and m+mod(ib-1, MB_B) > MB_B

If side = 'R':

  1. In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  2. If A is contained in a single block:
    q > 1 and n+mod(jb-1, NB_B) > NB_B

Example 1

This example shows the solution B<--alpha(A-1)B using a 2 × 2 process grid.

Call Statements and Input


  ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA   DIAG   M   N    ALPHA    A  IA  JA    DESC_A
               |      |      |       |    |   |      |      |   |   |      |
 CALL PDTRSM( 'L'  , 'U' ,  'N'  ,  'N' , 5 , 3 ,  1.0D0  , A , 1 , 1 ,  DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )


Desc_A Desc_B
DTYPE_ 1 1
CTXT_ icontxt(IOBG24) icontxt(IOBG24)
M_ 5 5
N_ 5 3
MB_ 2 2
NB_ 2 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below(EPSSL24) See below(EPSSL24)

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

  2. Each process should set the LLD_ as follows:
    LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
    LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
    

    In this example, LLD_A = LLD_B = 3 on P00 and P01, and LLD_A = LLD_B = 2 on P10 and P11.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D        0             1          2
     *                                  *
 0   |  3.0 -1.0  |   2.0  2.0  |   1.0 |
     |   .  -2.0  |   4.0 -1.0  |   3.0 |
     | -----------|-------------|------ |
 1   |   .    .   |  -3.0  0.0  |   2.0 |
     |   .    .   |    .   4.0  |  -2.0 |
     | -----------|-------------|------ |
 2   |   .    .   |    .    .   |   1.0 |
     *                                  *

The following is the 2 × 2 process grid:

B,D  |   0 2   |  1  
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  3.0 -1.0  1.0  |   2.0  2.0
 0   |   .  -2.0  3.0  |   4.0 -1.0
     |   .    .   1.0  |    .    .
-----|-----------------|------------
 1   |   .    .   2.0  |  -3.0  0.0
     |   .    .  -2.0  |    .   4.0

Global general 5 × 3 matrix B with block size 2 × 2:

B,D         0            1
     *                       *
 0   |   6.0  10.0  |  -2.0  |
     | -16.0  -1.0  |   6.0  |
     | -------------|------- |
 1   |  -2.0   1.0  |   -4.0 |
     |  14.0   0.0  |  -14.0 |
     | -------------|------- |
 2   |  -1.0   2.0  |    1.0 |
     *                       *

The following is the 2 × 2 process grid:

B,D  |    0    |  1  
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11

Local arrays for B:

p,q  |      0       |    1
-----|--------------|--------
     |   6.0  10.0  |   -2.0
 0   | -16.0  -1.0  |    6.0
     |  -1.0   2.0  |    1.0
-----|--------------|--------
 1   |  -2.0   1.0  |   -4.0
     |  14.0   0.0  |  -14.0

Output:

Global general 5 × 3 matrix B with block size 2 × 2:

B,D        0          1
     *                    *
 0   |  2.0  3.0  |   1.0 |
     |  5.0  5.0  |   4.0 |
     | -----------|------ |
 1   |  0.0  1.0  |   2.0 |
     |  3.0  1.0  |  -3.0 |
     | -----------|------ |
 2   | -1.0  2.0  |   1.0 |
     *                    *

The following is the 2 × 2 process grid:

B,D  |    0    |  1  
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11

Local arrays for B:

p,q  |     0      |   1
-----|------------|-------
     |  2.0  3.0  |   1.0
 0   |  5.0  5.0  |   4.0
     | -1.0  2.0  |   1.0
-----|------------|-------
 1   |  0.0  1.0  |   2.0
     |  3.0  1.0  |  -3.0

Example 2

This example shows the solution B<--alpha(A-H)B using a 2 × 2 process grid.

Call Statements and Input


  ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA   DIAG   M   N    ALPHA    A  IA  JA    DESC_A
               |      |      |       |    |   |      |      |   |   |      |
 CALL PZTRSM( 'L'  , 'U' ,  'C'  ,  'N' , 5 , 2 ,  ALPHA  , A , 1 , 1 ,  DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )
 
              ALPHA = (1.0D0, -1.0D0)


Desc_A Desc_B
DTYPE_ 1 1
CTXT_ icontxt(IOBG25) icontxt(IOBG25)
M_ 5 5
N_ 5 2
MB_ 2 2
NB_ 2 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below(EPSSL25) See below(EPSSL25)

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

  2. Each process should set the LLD_ as follows:
    LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
    LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
    

    In this example, LLD_A = LLD_B = 3 on P00 and P01, and LLD_A = LLD_B = 2 on P10 and P11.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D               0                         1                   2
     *                                                                 *
 0   | (-4.0, 1.0) ( 4.0,-3.0) | (-1.0, 3.0) ( 0.0, 0.0) | (-1.0, 0.0) |
     |      .      (-2.0, 0.0) | (-3.0,-1.0) (-2.0,-1.0) | ( 4.0, 3.0) |
     | ------------------------|-------------------------|------------ |
 1   |      .           .      | (-5.0, 3.0) (-3.0,-3.0) | (-5.0,-5.0) |
     |      .           .      |      .      ( 4.0,-4.0) | ( 2.0, 0.0) |
     | ------------------------|-------------------------|------------ |
 2   |      .           .      |      .           .      | ( 2.0,-1.0) |
     *                                                                 *

The following is the 2 × 2 process grid:

B,D  |   0 2   |  1  
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11

Local arrays for A:

p,q  |                  0                  |      1
-----|-------------------------------------|-------------------------------
     | (-4.0, 1.0) ( 4.0,-3.0) (-1.0, 0.0) | (-1.0, 3.0) ( 0.0, 0.0)     .
 0   |      .      (-2.0, 0.0) ( 4.0, 3.0) | (-3.0,-1.0) (-2.0,-1.0)     .
     |      .           .      ( 2.0,-1.0) |      .           .          .
-----|-------------------------------------|-------------------------------
 1   |      .           .      (-5.0,-5.0) | (-5.0, 3.0) (-3.0,-3.0)     . 
     |      .           .      ( 2.0, 0.0) |      .      ( 4.0,-4.0)     .

Global general 5 × 2 matrix B with block size 2 × 2:

B,D                0      
     *                           *
 0   | ( 5.5,-13.5) (-3.0, -5.0) |
     | (-6.5, 14.5) (-3.0,  5.0) |
     | ------------------------- |
 1   | (26.0, 18.0) ( 4.0, -3.0) |
     | (10.0,  3.0) ( 6.0, -6.0) |
     | ------------------------- |
 2   | ( 8.5, 10.5) (13.0,-12.0) |
     *                           *

The following is the 2 × 2 process grid:

B,D  |    0    | --  
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11

Local arrays for B:

p,q  |             0             |             1
-----|---------------------------|------------------------
     | ( 5.5,-13.5) (-3.0, -5.0) |             .
 0   | (-6.5, 14.5) (-3.0,  5.0) |             .
     | ( 8.5, 10.5) (13.0,-12.0) |             .
-----|---------------------------|------------------------
     | (26.0, 18.0) ( 4.0, -3.0) |             .
 1   | (10.0,  3.0) ( 6.0, -6.0) |             .

Output:

Global general 5 × 2 matrix B with block size 2 × 2:

B,D              0          
     *                        *
 0   | ( 3.0,4.0) ( 2.0, 0.0) |
     | (-4.0,2.0) ( 3.0,-1.0) |
     | ---------------------- |
 1   | (-5.0,0.0) (-1.0, 2.0) |
     | ( 1.0,3.0) ( 0.0,-2.0) |
     | ---------------------- |
 2   | ( 3.0,1.0) ( 1.0, 3.0) |
     *                        *

The following is the 2 × 2 process grid:

B,D  |    0    | --  
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11

Local arrays for B:

p,q  |            0            |           1
-----|-------------------------|-----------------------
     | ( 3.0, 4.0) ( 2.0, 0.0) |           .
 0   | (-4.0, 2.0) ( 3.0,-1.0) |           .
     | ( 3.0, 1.0) ( 1.0, 3.0) |           .
-----|-------------------------|-----------------------
 1   | (-5.0, 0.0) (-1.0, 2.0) |           .
     | ( 1.0, 3.0) ( 0.0,-2.0) |           .


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]