IBM Books

Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDGELS and PZGELS--General Matrix Least Squares Solution

|PDGELS solves overdetermined or underdetermined real linear systems involving a real general matrix A or its transpose, using a QR or LQ factorization. It is assumed that A has full rank.

|PZGELS solves overdetermined or underdetermined complex linear |systems involving a complex general matrix A or its conjugate |transpose, using a QR or LQ factorization. It is |assumed that A has full rank.

The following options are provided:

In the formulas above:

A represents the global general submatrix Aia:ia+m-1, ja:ja+n-1
If transa = 'N':
If transa <> 'N':
Note:
No data should be moved to form AT |or AH; that is, the matrix A should always be stored in its untransposed form.

If (m = 0 and n = 0) or nrhs = 0, then the subroutine returns after doing some parameter checking.

See references [13] and [37].

Table 64. Data Types

A, B, work Subroutine
Long-precision real PDGELS
Long-precision complex PZGELS

Syntax

Fortran CALL PDGELS|PZGELS (transa, m, n, nrhs, a, ia, ja, desc_a, b, ib, jb, desc_b, work, lwork, info)
C and C++ pdgels|pzgels (transa, m, n, nrhs, a, ia, ja, desc_a, b, ib, jb, desc_b, work, lwork, info);

On Entry

transa
indicates the form of matrix A used in the system of equations, where:

If transa = 'N', matrix A is used.

If transa = 'T', matrix AT is used.

|If transa = 'C', matrix |AH is used.

Scope: global

Specified as: a single character; transa = 'N', 'T', |or 'C'.

m
is the number of rows in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n
is the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

nrhs
is the number of right-hand sides; that is the number of columns in submatricies used in the computation.

Scope: global

Specified as: a fullword integer; nrhs >= 0.

a
is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+m-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+m-1 by ja+n-1 part of the global matrix.
Note:
No data should be moved to form AT |or AH; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 64. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia
is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.

ja
is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a
is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix If m = 0 or n = 0: M_A >= 0

Otherwise: M_A >= 1

Global
4 N_A Number of columns in the global matrix If m = 0 or n = 0: N_A >= 0

Otherwise: N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b
is the local part of the global general matrix B, containing the right-hand sides of the system. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol.

If transa = 'N', the leading LOCp(ib+m-1) by LOCq(jb+nrhs-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+nrhs-1 part of the global matrix; otherwise, the leading LOCp(ib+n-1) by LOCq(jb+nrhs-1) part of the local array B must contain the local pieces of the leading ib+n-1 by jb+nrhs-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 64. Details about the square block-cyclic data distribution of global matrix B are stored in desc_b.

ib
is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib + max(m, n)-1 <= M_B.

jb
is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+nrhs-1 <= N_B.

desc_b
is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix If (m = 0 and n = 0) or (nrhs = 0): M_B >= 0

Otherwise: M_B >= 1

Global
4 N_B Number of columns in the global matrix If (m = 0 and n = 0) or (nrhs = 0): N_B >= 0

Otherwise: N_B >= 1

Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size NB_B >= 1 Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

work
has the following meaning:

If lwork = 0, work is ignored.

If lwork <> 0, work is the work area used by this subroutine, where:

Scope: local

Specified as: an area of storage containing numbers of data type indicated in Table 64.

lwork
is the number of elements in array WORK.

Scope:

Specified as: a fullword integer; where:

info
See On Return.

On Return

a
is the updated local part of the global general matrix A. Matrix A is overwritten; the original input is not preserved.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 64. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

b
is the updated local part of the global general matrix B, overwritten by the solution vectors stored columnwise.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 64. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

work
is the work area used by this subroutine if lwork <> 0.

Scope: local

Returned as: an area of storage, where:

If lwork >= 1 or lwork = -1, then work1 is set to the minimum lwork value and contains numbers of the data type indicated in Table 64. Except for work1, the contents of work are overwritten on return.

info
indicates that a successful computation occurred.

Scope: global

Returned as: a fullword integer; info = 0.

Notes and Coding Rules
  1. This subroutine accepts lowercase letters for the transa argument.
  2. In your C program, argument info must be passed by reference.
  3. Matrices A, B, and work must have no common elements; otherwise, results are unpredictable.
  4. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see Determining the Number of Rows and Columns in Your Local Arrays and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
  5. For suggested block sizes, see Coding Tips for Optimizing Parallel Performance.
  6. The following values must be equal: CTXT_A = CTXT_B
  7. If m >= n:
  8. If m < n:
  9. If m < n and m <> 0, n <> 0, and nrhs <> 0:
  10. If m <> 0, n <> 0, and nrhs <> 0 and if Aia:ia+min(m, n)-1, ja:ja+min(m, n)-1 is not contained in a single block, that is, either of the following is true:
    min(m, n) + mod(ia-1, MB_A) > MB_A
    min(m, n) + mod(ja-1, NB_A) > NB_A
    then:
  11. If lwork = -1 on any process, it must equal -1 on all processes. That is, if a subset of the processes specifies -1 for the work area size, they must all specify -1.

Function

|PDGELS solves overdetermined or underdetermined real linear systems involving a real general rectangular matrix A, or its transpose, using a QR or LQ factorization. It is assumed that A has full rank.

|PZGELS solves overdetermined or underdetermined complex linear |systems involving a complex general matrix A or its conjugate |transpose, using a QR or LQ factorization. It is |assumed that A has full rank.

The following options are provided:

In the formulas above:

A represents the global general submatrix Aia:ia+m-1, ja:ja+n-1
If transa = 'N':
If transa <> 'N':
Note:
No data should be moved to form AT |or AH; that is, the matrix A should always be stored in its untransposed form.

If (m = 0 and n = 0) or nrhs = 0, then the subroutine returns after doing some parameter checking.

See references [13] and [37].

Error Conditions

Computational Errors

None

Resource Errors
  1. lwork = 0 and unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1 

  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.

Stage 2 

  1. CTXT_A is invalid.

Stage 3 

  1. |This subroutine has been called from outside the process grid.

Stage 4 

  1. transa <> 
  2. m < 0
  3. n < 0
  4. nrhs < 0
  5. M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
  6. N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
  7. MB_A < 1
  8. NB_A < 1
  9. RSRC_A < 0 or RSRC_A >= p
  10. CSRC_A < 0 or CSRC_A >= q
  11. ia < 1
  12. ja < 1
  13. M_B < 0 and ((m = 0 and n = 0) or nrhs = 0); M_B < 1 otherwise
  14. N_B < 0 and ((m = 0 and n = 0) or nrhs = 0); N_B < 1 otherwise
  15. MB_B < 1
  16. NB_B < 1
  17. RSRC_B < 0 or RSRC_B >= p
  18. CSRC_B < 0 or CSRC_B >= q
  19. ib < 1
  20. jb < 1
  21. CTXT_A <> CTXT_B

Stage 5  If m <> 0, n <> 0, and nrhs <> 0 and if Aia:ia+min(m, n)-1, ja:ja+min(m, n)-1 is not contained in a single block, that is, either of the following is true:

min(m, n) + mod(ia-1, MB_A) > MB_A
min(m, n) + mod(ja-1, NB_A) > NB_A
  1. MB_A <> NB_A
  2. MB_B <> NB_A

If m <> 0 and n <> 0:

  1. ia > M_A
  2. ja > M_A
  3. ia+m-1 > M_A
  4. ja+n-1 > N_A

If (m <> 0 or n <> 0) and nrhs <> 0:

  1. ib > M_B
  2. jb > M_B
  3. ib+m-1 > M_B and m >= n; ib+n-1 > M_B and m < n
  4. jb+nrhs-1 > N_B

If Aia:ia+min(m, n)-1, ja:ja+min(m, n)-1 is not contained in a single block and (m <> 0, n <> 0, and nrhs <> 0):

  1. mod(ia-1, MB_A) <> 0
  2. mod(ja-1, NB_A) <> 0
  3. mod(ib-1, MB_B) <> 0

If m >= n:

  1. MB_A <> MB_B
  2. mod(ia-1, MB_A) <> mod(ib-1, MB_B)

If m < n:

  1. NB_A <> MB_B
  2. mod(ja-1, NB_A) <> mod(ib-1, MB_B)

Stage 6 

  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_B < max(1, LOCp(M_B))

If m >= n:

  1. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
    iarow =  mod(RSRC_A+(ia-1)/MB_A, p)
    ibrow =  mod(RSRC_B+(ib-1)/MB_B, p)

If m < n and (m <> 0, n <> 0, and nrhs <> 0) :

  1. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
    iarow =  mod(RSRC_A+(ia-1)/MB_A, p)
    ibrow =  mod(RSRC_B+(ib-1)/MB_B, p)

In all cases:

  1. lwork <> 0, lwork <> -1, and lwork < (minimum value) (For minimum value, see lwork parameter description.)

Stage 7 

    Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:

  1. transa differs.
  2. m differs.
  3. n differs.
  4. nrhs differs.
  5. ia differs.
  6. ja differs.
  7. DTYPE_A differs.
  8. M_A differs.
  9. N_A differs.
  10. MB_A differs.
  11. NB_A differs.
  12. RSRC_A differs.
  13. CSRC_A differs.
  14. ib differs.
  15. jb differs.
  16. DTYPE_B differs.
  17. M_B differs.
  18. N_B differs.
  19. MB_B differs.
  20. NB_B differs.
  21. RSRC_B differs.
  22. CSRC_B differs.

    Also:

  23. lwork = -1 on a subset of processes.

|Example 1

This example illustrates the least squares solution of an overdetermined |real general system of size 4 × 3 with 5 right hand sides, using a 2 × 2 process grid.

Note:
Because lwork = 0, PDGELS dynamically allocates the work area used by this subroutine.

Call Statements and Input


ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
             TRANSA  M   N  NRHS  A  IA  JA   DESC_A   B  IB  JB   DESC_B   WORK  LWORK  INFO
               |     |   |   |    |   |   |     |      |   |   |     |       |      |     |
CALL PDGELS ( 'N' ,  4 , 3 , 5  , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B , WORK ,  0  , INFO )


DESC_A DESC_B
DTYPE_ 1 1
CTXT_ icontxt(IOBG57) icontxt(IOBG57)
M_ 4 4
N_ 3 5
MB_ 1 1
NB_ 1 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below(EPSSL57) See below(EPSSL57)

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

  2. Each process should set the LLD_ as follows:
    LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
    LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
    

    In this example, LLD_A and LLD_B = 2 on all processes.

Global general matrix A of size 4 by 3, with block sizes 1 × 1:

B,D      0        1        2 
     *                         *
 0   |  1.00 |  -2.00 |  -1.00 |
     | ------|--------|--------|
 1   |  2.00 |    .00 |   1.00 |
     | ------|--------|--------|
 2   |  2.00 |  -4.00 |   2.00 |
     | ------|--------|--------|
 3   |  4.00 |    .00 |    .00 |
     *                         *

Global general matrix B of size 4 by 5, with block sizes 1 × 2:

B,D          0              1           2 
     *                                      *
 0   | -1.00  -2.00 | -7.00   0.00 |  -5.00 |
     | -------------|--------------|--------|
 1   |  1.00   3.00 |  4.00   3.00 |   5.00 |
     | -------------|--------------|--------|
 2   |  1.00   0.00 |  4.00   2.00 |   2.00 |
     | -------------|--------------|--------|
 3   | -2.00   4.00 |  4.00   0.00 |   4.00 |
     *                                      *

The following is the 2 × 2 process grid:

B,D  |   0 2   |   1 
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11
3    |         |

Local arrays for A:

p,q  |       0      |    1
-----|--------------|--------
 0   |  1.00  -1.00 |  -2.00 
     |  2.00   2.00 |  -4.00 
-----|--------------|--------
 1   |  2.00   1.00 |   0.00 
     |  4.00   0.00 |   0.00

Local arrays for B:

p,q  |           0         |        1
-----|---------------------|---------------
 0   | -1.00  -2.00  -5.00 |  -7.00   0.00 
     |  1.00   0.00   2.00 |   4.00   2.00 
-----|---------------------|---------------
 1   |  1.00   3.00   5.00 |   4.00   3.00 
     | -2.00   4.00   4.00 |   4.00   0.00

Output:

Global general matrix B of size 4 by 5, with block sizes 1 × 2:

B,D          0              1           2 
     *                                      *
 0   | -0.40   1.00 |  0.80   0.20 |   1.00 |
     | -------------|--------------|--------|
 1   |  0.00   1.00 |  1.50   0.00 |   1.50 |
     | -------------|--------------|--------|
 2   |  1.00   1.00 |  4.00   1.00 |   3.00 |
     | -------------|--------------|--------|
 3   | -1.00   0.00 |  2.00  -2.00 |   0.00 |
     *                                      *

The following is the 2 × 2 process grid:

B,D  |   0 2   |   1 
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11
3    |         |

Local arrays for B:

p,q  |           0         |        1
-----|---------------------|---------------
 0   | -0.40   1.00   1.00 |   0.80   0.20 
     |  1.00   1.00   3.00 |   4.00   1.00 
-----|---------------------|---------------
 1   |  0.00   1.00   1.50 |   1.50   0.00 
     | -1.00   0.00   0.00 |   2.00  -2.00

The value of info is 0 on all processes.

|Example 2

|This example illustrates the least squares solution of an underdetermined |complex general system of size 3 × 4 with 3 right hand sides, |using a 2 × 2 process grid.

|Note:
Because lwork = 0, PZGELS dynamically allocates the work |area used by this subroutine. |

|Call Statements and Input


|

|ORDER = 'R'
|NPROW = 2
|NPCOL = 2
|CALL BLACS_GET (0, 0, ICONTXT)
|CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
|CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
| 
|             TRANSA  M   N  NRHS  A  IA  JA   DESC_A   B  IB  JB   DESC_B   WORK  LWORK  INFO
|               |     |   |   |    |   |   |     |      |   |   |     |       |      |     |
|CALL PZGELS ( 'N' ,  3 , 4 , 3  , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B , WORK ,  0  , INFO )

|

DESC_A DESC_B
DTYPE_ 1 1
CTXT_ icontxt(NT1GELS) icontxt(NT1GELS)
M_ 3 4
N_ 4 3
MB_ 1 1
NB_ 1 1
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below(NT2GELS) See below(NT2GELS)

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

  2. Each process should set the LLD_ as follows:
    LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
          = 2 on P00 and P01 and 1 on P10 and P11
    LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
          = 2 on all processes      
    

|Global general matrix A of size 3 by 4, with block sizes |1 × 1:

|B,D          0               1               2               3 
|     *                                                               *
| 0   | ( 1.00, 0.00) | (-2.00, 1.00) | (-3.00,-1.00) | ( 4.00,-3.00) |
|     | --------------|---------------|-------------- |-------------- |
| 1   | ( 1.00,-1.00) | ( 2.00, 2.00) | (-3.00, 0.00) | (-4.00,-2.00) |
|     | --------------|---------------|-------------- |-------------- |
| 2   | ( 1.00,-2.00) | (-2.00, 3.00) | (-3.00, 1.00) | ( 4.00,-1.00) |
|     | --------------|---------------|-------------- |-------------- |
|     *                                                               *
| 

|Global general matrix B of size 4 by 3, with block sizes |1 × 1:

|B,D          0              1              2 
|     *                                             *
| 0   | ( 1.00, .00) | ( .00, 1.00) | ( 1.00, 1.00) |
|     | -------------|--------------|---------------|
| 1   | (-1.00,1.00) | (1.00,-1.00) | (  .00,  .00) |
|     | -------------|--------------|---------------|
| 2   |   2.00,1.00  | (1.00, 2.00) | (-1.00,-1.00) |
|     | -------------|--------------|---------------|
| 3   | (  .  , .  ) | ( .  ,  .  ) | (   . ,  .  ) |
|     *                                             * 

|The following is the 2 × 2 process grid:

|B,D  |   0 2   |   1 3 
|-----| ------- |-----
|0    |   P00   |  P01
|2    |         |
|-----| ------- |-----
|1    |   P10   |  P11
|3    |         |

|Local arrays for A:

|p,q  |              0              |                1
|-----|-----------------------------|-----------------------------
| 0   |  (1.00, 0.00) (-3.00,-1.00) |  (-2.00, 1.00) ( 4.00,-3.00)
|     |  (1.00,-2.00) (-3.00, 1.00) |  (-2.00, 3.00) ( 4.00,-1.00)
|-----|-----------------------------|-----------------------------
| 1   |  (1.00,-1.00) (-3.00, 0.00) |  ( 2.00, 2.00) (-4.00,-2.00)
| 

|Local arrays for B:

|p,q  |              0              |        1
|-----|-----------------------------|---------------
| 0   | ( 1.00,  .00) ( 1.00, 1.00) |  ( 1.00,-1.00)
|     | ( 2.00, 1.00) (-1.00,-1.00) |  ( 1.00, 2.00)
|-----|-----------------------------|---------------
| 1   | (-1.00, 1.00) (  .00,  .00) |  ( 1.00,-1.00) 
|     | (  .  ,  .  ) (  .  ,  .  ) |  (  .  ,  .  )

|Output:

|Global general matrix B of size 4 by 3, with block sizes |1 × 1:

|B,D           0               1               2 
|     *                                               *
| 0   | ( -.16,  .15) | ( -.08,  .18) | (  .16, -.31) |
|     | --------------|---------------|---------------|
| 1   | (  .11,  .02) | (  .21, -.50) | ( -.38,  .65) |
|     | --------------|---------------|---------------|
| 2   | ( -.13, -.32) | (  .16,  .12) | ( -.27, -.28) |
|     | --------------|---------------|---------------|
| 3   | (  .37, -.05) | (  .04,  .06) | ( -.19,  .32) |
|     *                                               *     

|The following is the 2 × 2 process grid:

|B,D  |   0 2   |   1 
|-----| ------- |-----
|0    |   P00   |  P01
|2    |         |
|-----| ------- |-----
|1    |   P10   |  P11
|3    |         |

|Local arrays for B:

|p,q  |              0              |        1
|-----|-----------------------------|--------------
| 0   | ( -.16,  .15) (  .16, -.31) | ( -.08,  .18)
|     | ( -.13, -.32) ( -.27, -.28) | (  .16,  .12)
|-----|-----------------------------|--------------
| 1   | (  .11,  .02) ( -.38,  .65) | (  .21, -.50)
|     | (  .37, -.05) ( -.19,  .32) | (  .04,  .06)

|The value of info is 0 on all processes.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]