IBM Books

Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDPTTRS--Positive Definite Symmetric Tridiagonal Matrix Solve

This subroutine solves the following tridiagonal systems of linear equations for multiple right-hand sides, using the positive definite symmetric tridiagonal matrix A, where A is stored in parallel-symmetric-tridiagonal storage mode:

AX = B

In this subroutine:

This subroutine uses the results of the factorization of matrix A, produced by a preceding call to PDPTTRF. The output from PDPTTRF should be used only as input to this solve subroutine.

If n = 0 or nrhs = 0, no computation is performed and the subroutine returns after doing some parameter checking. See reference [51].

Table 96. Data Types

d, e, B, af, work Subroutine
Long-precision real PDPTTRS

Syntax

Fortran CALL PDPTTRS (n, nrhs, d, e, ia, desc_a, b, ib, desc_b, af, laf, work, lwork, info)
C and C++ pdpttrs (n, nrhs, d, e, ia, desc_a, b, ib, desc_b, af, laf, work, lwork, info);

On Entry

n
is the order of the positive definite symmetric tridiagonal submatrix A and the number of rows in the general submatrix B, which contains the multiple right-hand sides.

Scope: global

Specified as: a fullword integer, where:

where p is the number of processes in a process grid.

nrhs
is the number of right-hand sides; that is, the number of columns in submatrix B used in the computation.

Scope: global

Specified as: a fullword integer; nrhs >= 0.

d
is the local part of the global vector d, containing part of the factorization produced from a preceding call to PDPTTRF. This identifies the first element of the local array D. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, and p; therefore, the leading LOCp(ia+n-1) part of the local array D contains the local pieces of the leading ia+n-1 part of the global vector.

Scope: local

Specified as: a one-dimensional array of (at least) length LOCp(ia+n-1), containing numbers of the data type indicated in Table 96. Details about block-cyclic data distribution of global matrix A are stored in desc_a.

e
is the local part of the global vector e, containing part of the factorization produced from a preceding call to PDPTTRF. This identifies the first element of the local array E. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, and p; therefore, the leading LOCp(ia+n-1) part of the local array E contains the local pieces of the leading ia+n-1 part of the global vector.

Scope: local

Specified as: a one-dimensional array of (at least) length LOCp(ia+n-1), containing numbers of the data type indicated in Table 96. Details about block-cyclic data distribution of global matrix A are stored in desc_a.

ia
is the row or column index of the global matrix A, identifying the first row or column of the submatrix A.

Scope: global

Specified as: a fullword integer, where:

desc_a
is the array descriptor for global matrix A. Because vectors are one-dimensional data structures, you may use a type-502, type-501, or type-1 array descriptor regardless of whether the process grid is p × 1 or 1 × p. For a type-502 array descriptor, the process grid is used as if it is a p × 1 process grid. For a type-501 array descriptor, the process grid is used as if it is a 1 × p process grid. For a type-1 array descriptor, the process grid is used as if it is either a p × 1 process grid or a 1 × p process grid. The following tables describe three types of array descriptors. For rules on using array descriptors, see Notes and Coding Rules.

Table 97. Type-502 Array Descriptor

desc_a Name Description Limits Scope
1 DTYPE_A Descriptor Type DTYPE_A=502 for p × 1 or 1 × p

where p is the number of processes in a process grid.

Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix If n = 0:
M_A >= 0
Otherwise:
M_A >= 1
Global
4 MB_A Row block size MB_A >= 1 and 0 <= n <= (MB_A)(p)-mod(ia-1,MB_A) Global
5 RSRC_A The process row over which the first row of the global matrix is distributed 0 >= RSRC_A < p Global
6 -- Not used by this subroutine. -- --
7 -- Reserved -- --

Specified as: an array of (at least) length 7, containing fullword integers.

Table 98. Type-1 Array Descriptor (p × 1 Process Grid)

desc_a Name Description Limits Scope
1 DTYPE_A Descriptor Type DTYPE_A = 1 for p × 1

where p is the number of processes in a process grid.

Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix If n = 0:
M_A >= 0
Otherwise:
M_A >= 1
Global
4 N_A Number of columns in the global matrix N_A = 1
5 MB_A Row block size MB_A >= 1 and 0 <= n <= (MB_A)(p)-mod(ia-1,MB_A) Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column over which the first column of the global matrix is distributed CSRC_A = 0 Global
9 -- Not used by this subroutine. -- --

Specified as: an array of (at least) length 9, containing fullword integers.

Table 99. Type-501 Array Descriptor

desc_a Name Description Limits Scope
1 DTYPE_A Descriptor Type DTYPE_A=501 for 1 × p or p × 1

where p is the number of processes in a process grid.

Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 N_A Number of columns in the global matrix If n = 0:
N_A >= 0
Otherwise:
N_A >= 1
Global
4 NB_A Column block size NB_A >= 1 and 0 <= n <= (NB_A)(p)-mod(ia-1,NB_A) Global
5 CSRC_A The process column over which the first column of the global matrix is distributed 0 <= CSRC_A < p Global
6 -- Not used by this subroutine. -- --
7 -- Reserved -- --

Specified as: an array of (at least) length 7, containing fullword integers.

Table 100. Type-1 Array Descriptor (1 × p Process Grid)

desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A = 1 for 1 × p

where p is the number of processes in a process grid.

Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix M_A = 1 Global
4 N_A Number of columns in the global matrix If n = 0:
N_A >= 0
Otherwise:
N_A >= 1
Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 and 0 <= n <= (NB_A)(p)-mod(ia-1,NB_A) Global
7 RSRC_A The process row over which the first row of the global matrix is distributed RSRC_A = 0 Global
8 CSRC_A The process column over which the first column of the global matrix is distributed 0 <= CSRC_A < p Global
9 -- Not used by this subroutine. -- --

Specified as: an array of (at least) length 9, containing fullword integers.

b
is the local part of the global general matrix B, containing the multiple right-hand sides of the system. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, desc_b, and p; therefore, the leading LOCp(ib+n-1) by nrhs part of the local array B must contain the local pieces of the leading ib+n-1 by nrhs part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) nrhs array, containing numbers of the data type indicated in Table 96. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib
is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+n-1 <= M_B.

desc_b
is the array descriptor for global matrix B, which may be type 502 or type 1, as described in the following tables. For type-502 array descriptor, the process grid is used as if it is a p × 1 process grid. For rules on using array descriptors, see Notes and Coding Rules.
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B = 502 for p × 1 or 1 × p

where p is the number of processes in a process grid.

Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix If n = 0:
M_B >= 0
Otherwise:
M_B >= 1
Global
4 MB_B Row block size MB_B >= 1 and 0 <= n <= (MB_B)(p)-mod(ib-1,MB_B) Global
5 RSRC_B The process row over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
6 LLD_B Leading dimension LLD_B >= max(1, LOCp(M_B)) Local
7 -- Reserved -- --

Specified as: an array of (at least) length 7, containing fullword integers.

desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B = 1 for p × 1

where p is the number of processes in a process grid.

Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix If n = 0:
M_B >= 0
Otherwise:
M_B >= 1
Global
4 N_B Number of columns in the global matrix N_B >= nrhs Global
5 MB_B Row block size MB_B >= 1 and 0 <= n <= (MB_B)(p)-mod(ib-1,MB_B) Global
6 NB_B Column block size NB_B >= 1 Global
7 RSRC_B The process row over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column over which the first column of the global matrix is distributed CSRC_B = 0 Global
9 LLD_B Leading dimension LLD_B >= max(1, LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

af
is a work area used by this subroutine and contains part of the factorization produced on a preceding call to PDPTTRF. Its size is specified by laf.

Scope: local

Specified as: a one-dimensional array of (at least) length laf, containing numbers of the data type indicated in Table 96.

laf
is the number of elements in array AF.

Scope: local

Specified as: a fullword integer, where:

where, in the above formulas, P is the actual number of processes containing data.

work
has the following meaning:

If lwork = 0, work is ignored.

If lwork <> 0, work is the work area used by this subroutine, where:

Scope: local

Specified as: an area of storage containing numbers of data type indicated in Table 96.

lwork
is the number of elements in array WORK.

Scope:

Specified as: a fullword integer; where:

info
See On Return.

On Return

b
b is the updated local part of the global matrix B, containing the solution vectors.

Scope: local

Returned as: an LLD_B by (at least) nrhs array, containing numbers of the data type indicated in Table 96.

work
is the work area used by this subroutine if lwork <> 0, where:

If lwork <> 0 and lwork <> -1, the size of work is (at least) of length lwork.

If lwork = -1, the size of work is (at least) of length 1.

Scope: local

Returned as: an area of storage, containing numbers of data type indicated in Table 96, where:

Except for work1, the contents of work are overwritten on return.

info
indicates that a successful computation or work area query occurred.

Scope: global

Returned as: a fullword integer; info = 0.

Notes and Coding Rules
  1. In your C program, argument info must be passed by reference.
  2. The output from the PDPTTRF subroutine should be used only as input to the solve subroutine PDPTTRS.

    The factored matrix A is stored in an internal format that depends on the number of processes.

    The scalar data specified for input argument n must be the same for both PDPTTRF and PDPTTRS.

    The global vectors for d, e, and af input to PDPTTRS must be the same as the corresponding output arguments for PDPTTRF; and thus, the scalar data specified for ia, desc_a, and laf must also be the same.

  3. In all cases, follow these rules:
  4. To determine the values of LOCp(n) used in the argument descriptions, see Determining the Number of Rows and Columns in Your Local Arrays for descriptor type-1 or Determining the Number of Rows or Columns in Your Local Arrays for descriptor type-501 and type-502.
  5. d, e, af and work must have no common elements; otherwise, results are unpredictable.
  6. The global positive definite symmetric tridiagonal matrix A must be stored in parallel-symmetric-tridiagonal storage mode and distributed over a one-dimensional process grid, using block-cyclic data distribution. See the section on block-cyclically distributing a tridiagonal matrix in Matrices.

    For more information on using block-cyclic data distribution, see Specifying Block-Cyclically-Distributed Matrices for the Banded Linear Algebraic Equations.

  7. Matrix B must be distributed over a one-dimensional process grid, using block-cyclic data distribution. For more information using block-cyclic data distribution, see Specifying Block-Cyclically-Distributed Matrices for the Banded Linear Algebraic Equations. Also, see the section on distributing the right-hand side matrix in Matrices.
  8. If lwork = -1 on any process, it must equal -1 on all processes. That is, if a subset of the processes specifies -1 for the work area size, they must all specify -1.
  9. Although global matrices A and B may be block-cyclically distributed on a 1×p or p × 1 process grid, the values of n, ia, ib, MB_A (if (the process grid is p × 1 and DTYPE_A = 1) or DTYPE_A = 502), NB_A (if (the process grid is 1 × p and DTYPE_A = 1) or DTYPE_A = 501), must be chosen so that each process has at most one full or partial block of each of the global submatrices A and B.
  10. For global tridiagonal matrix A, use of the type-1 array descriptor is an extension to ScaLAPACK 1.5. If your application needs to run with both Parallel ESSL and ScaLAPACK 1.5, it is suggested that you use either a type-501 or a type-502 array descriptor for the matrix A.

Error Conditions

Computational Errors

None

Note:
If the factorization performed by PDPTTRF failed because of a nonpositive definite matrix A, the results returned by this subroutine are unpredictable. For details, see the info output argument for PDPTTRF.

Resource Errors

Unable to allocate workspace

Input-Argument and Miscellaneous Errors

Stage 1 

  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.

Stage 2 

  1. CTXT_A is invalid.

Stage 3 

  1. This subroutine was called from outside the process grid.

Stage 4 

Note:
In the following error conditions:
  1. The process grid is not 1 × p or p × 1.
  2. CTXT_A <> CTXT_B
  3. n < 0
  4. ia < 1
  5. DTYPE_A = 1 and M_A <>1 and N_A <> 1

    If (the process grid is 1 × p and DTYPE_A = 1) or DTYPE_A = 501:

  6. N_A < 0 and (n = 0); N_A < 1 otherwise
  7. NB_A < 1
  8. n > (NB_A)(p)-mod(ia-1,NB_A)
  9. ia > N_A and (n > 0)
  10. ia+n-1 > N_A and (n > 0)
  11. CSRC_A < 0 or CSRC_A >= p
  12. NB_A <> MB_B
  13. CSRC_A <> RSRC_B

    If the process grid is 1 × p and DTYPE_A = 1:

  14. M_A <> 1
  15. MB_A < 1
  16. RSRC_A <> 0

    If (the process grid is p × 1 and DTYPE_A = 1) or DTYPE_A = 502:

  17. M_A < 0 and (n = 0); M_A < 1 otherwise
  18. MB_A < 1
  19. n > (MB_A)(p)-mod(ia-1,MB_A)
  20. ia > M_A and (n > 0)
  21. ia+n-1 > M_A and (n > 0)
  22. RSRC_A < 0 or RSRC_A >= p
  23. MB_A <> MB_B
  24. RSRC_A <> RSRC_B

    If the process grid is p × 1 and DTYPE_A = 1:

  25. N_A <> 1
  26. NB_A < 1
  27. CSRC_A <> 0

    In all cases:

  28. ia <> ib
  29. DTYPE_B = 1 and the process grid is 1 × p and p > 1
  30. nrhs < 0
  31. ib < 1
  32. M_B < 0 and (n = 0); M_B < 1 otherwise
  33. MB_B < 1
  34. ib > M_B and (n > 0)
  35. ib+n-1 > M_B and (n > 0)
  36. RSRC_B < 0 or RSRC_B >= p
  37. LLD_B < max(1,LOCp(M_B))

    If DTYPE_B = 1:

  38. N_B < 0 and (nrhs = 0); N_B < 1 otherwise
  39. N_B < nrhs
  40. NB_B < 1
  41. CSRC_B <> 0

    In all cases:

  42. laf < (minimum value) (For the minimum value, see the laf argument description.)
  43. lwork <> 0, lwork <> -1, and lwork < (minimum value) (For the minimum value, see the lwork argument description.)

Stage 5 

    Each of the following global input arguments are checked to determine whether its value is the same on all processes in the process grid:

  1. n differs.
  2. nrhs differs.
  3. ia differs.
  4. ib differs.
  5. DTYPE_A differs.

    If DTYPE_A = 1 on all processes:

  6. M_A differs.
  7. N_A differs.
  8. MB_A differs.
  9. NB_A differs.
  10. RSRC_A differs.
  11. CSRC_A differs.

    If DTYPE_A = 501 on all processes:

  12. N_A differs.
  13. NB_A differs.
  14. CSRC_A differs.

    If DTYPE_A = 502 on all processes:

  15. M_A differs.
  16. MB_A differs.
  17. RSRC_A differs.

    In all cases:

  18. DTYPE_B differs.

    If DTYPE_B = 1 on all processes:

  19. M_B differs.
  20. N_B differs.
  21. MB_B differs.
  22. NB_B differs.
  23. RSRC_B differs.
  24. CSRC_B differs.

    If DTYPE_B = 502 on all processes:

  25. M_B differs.
  26. MB_B differs.
  27. RSRC_B differs.

    Also:

  28. lwork = -1 on a subset of processes.

Example

This example shows how to solve the system AX=B, where matrix A is the same positive definite symmetric tridiagonal matrix factored in Example for PDPTTRF.

Notes:

  1. The vectors d and e, output from PDPTTRF, are stored in an internal format that depends on the number of processes. These vectors are passed, unchanged, to the solve subroutine PDPTTRS.

  2. The contents of the af vector, output from PDPTTRF, is not shown. This vector is passed, unchanged, to the solve subroutine PDPTTRS.

  3. Because lwork = 0, this subroutine dynamically allocates the work area used by this subroutine.

Call Statements and Input


ORDER = 'R'
NPROW = 3
NPCOL = 1
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              N  NRHS  D  E  IA   DESC_A   B  IB   DESC_B  AF   LAF  WORK LWORK INFO
              |    |   |  |   |      |     |   |      |    |     |    |     |     |
CALL PDPTTRS( 12 , 3 , D, E , 1 , DESC_A , B , 1 , DESC_B, AF , 48 , WORK , 0 , INFO)


Desc_A
DTYPE_ 502
CTXT_ icontxt(CGBTOO5)
M_ 12
MB_ 4
RSRC_ 0
Not used --
Reserved --

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.



Desc_B
DTYPE_ 502
CTXT_ icontxt(CGBTOO6)
M_ 12
MB_ 4
RSRC_ 0
LLD_B 4
Reserved --

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

Global vector d with block size of 4:

B,D     0
     *      *
     |  .25 |
     |  .25 |
 0   |  .25 |
     | 4.0  |
     | ---- |
     |  .2  |
     |  .24 |
 1   |  .25 |
     | 4.01 |
     | ---- |
     | 4.01 |
     |  .25 |
 2   |  .24 |
     |  .2  |
     *      *

Global vector e with block size of 4:

B,D     0
     *      *
     | 2.0  |
     | 2.0  |
 0   | 2.0  |
     | 2.0  |
     | ---- |
     | 2.0  |
     | 2.0  |
 1   | 2.0  |
     | 2.0  |
     | ---- |
     |  .49 |
     |  .48 |
 2   |  .4  |
     |  .   |
     *      *

The following is the 3 × 1 process grid:

B,D  |    0    
-----| -------  
0    |   P00
-----| -------  
1    |   P10
-----| -------  
2    |   P20

Local array D with block size of 4:

p,q  |  0
-----|------
     |  .25
     |  .25
 0   |  .25
     | 4.0
-----|------
     |  .2
     |  .24
 1   |  .25
     | 4.01
-----|------
     | 4.01
     |  .25
 2   |  .24
     |  .2

Local array E with block size of 4:

p,q  |  0
-----|------
     | 2.0
     | 2.0
 0   | 2.0
     | 2.0
-----|------
     | 2.0
     | 2.0
 1   | 2.0
     | 2.0
-----|------
     |  .49
     |  .48
 2   |  .4
     |  .

Global matrix B with a block size of 4:

p,q  |       0
-----|----------------
     | 70.0  8.0  6.0
     | 99.0 18.0  9.0
 0   | 90.0 27.0  9.0
     | 81.0 36.0  9.0
-----|----------------
     | 72.0 45.0  9.0
     | 63.0 54.0  9.0
 1   | 54.0 63.0  9.0
     | 45.0 72.0  9.0
-----|----------------
     | 36.0 81.0  9.0
     | 27.0 90.0  9.0
 2   | 18.0 99.0  9.0
     |  9.0 82.0  7.0

The following is the 3 × 1 process grid:

B,D  |    0    
-----| -------  
0    |   P00
-----| -------  
1    |   P10
-----| -------  
2    |   P20

Local matrix B with block size of 4:

p,q  |       0
-----|----------------
     | 70.0  8.0  6.0
     | 99.0 18.0  9.0
 0   | 90.0 27.0  9.0
     | 81.0 36.0  9.0
-----|----------------
     | 72.0 45.0  9.0
     | 63.0 54.0  9.0
 1   | 54.0 63.0  9.0
     | 45.0 72.0  9.0
-----|----------------
     | 36.0 81.0  9.0
     | 27.0 90.0  9.0
 2   | 18.0 99.0  9.0
     |  9.0 82.0  7.0

Output:

Global matrix B with block size of 4:

B,D           0
     *                 *
     | 12.0  1.0  1.0  |
     | 11.0  2.0  1.0  |
 0   | 10.0  3.0  1.0  |
     |  9.0  4.0  1.0  |
     | --------------- |
     |  8.0  5.0  1.0  |
     |  7.0  6.0  1.0  |
 1   |  6.0  7.0  1.0  |
     |  5.0  8.0  1.0  |
     | --------------- |
     |  4.0   9.0  1.0 |
     |  3.0  10.0  1.0 |
 2   |  2.0  11.0  1.0 |
     |  1.0  12.0  1.0 |
     *                 *

The following is the 3 × 1 process grid:

B,D  |    0    
-----| -------  
0    |   P00
-----| -------  
1    |   P10
-----| -------  
2    |   P20

Local matrix B with block size of 4:

p,q  |        0
-----|-----------------
     | 12.0  1.0  1.0
     | 11.0  2.0  1.0
 0   | 10.0  3.0  1.0
     |  9.0  4.0  1.0
-----|-----------------
     |  8.0  5.0  1.0
     |  7.0  6.0  1.0
 1   |  6.0  7.0  1.0
     |  5.0  8.0  1.0
-----|-----------------
     |  4.0   9.0  1.0
     |  3.0  10.0  1.0
 2   |  2.0  11.0  1.0
     |  1.0  12.0  1.0

The value of info is 0 on all processes.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]