IBM Books

Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDGEQRF and PZGEQRF--General Matrix QR Factorization

|These subroutines compute the QR factorization of a general matrix A, where, in this description:

A represents the global general submatrix Aia:ia+m-1, ja:ja+n-1 to be factored.
|For PDGEQRF, Q is an orthogonal matrix.
|For PZGEQRF, Q is a unitary matrix.
For m >= n, R is an upper triangular matrix.
For m < n, R is an upper trapezoidal matrix.

If m = 0 or n = 0, no computation is performed and the subroutine returns after doing some parameter checking.

See references [23] and [37].

Table 63. Data Types

A, tau, work Subroutine
Long-precision real PDGEQRF
Long-precision complex PZGEQRF

Syntax

Fortran CALL PDGEQRF | PZGEQRF (m, n, a, ia, ja, desc_a, tau, work, lwork, info)
C and C++ pdgeqrf | pzgeqrf (m, n, a, ia, ja, desc_a, tau, work, lwork, info);

On Entry

m
is the number of rows in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n
is the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

a
is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+m-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+m-1 by ja+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 63. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia
is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.

ja
is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a
is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix If m = 0 or n = 0: M_A >= 0

Otherwise: M_A >= 1

Global
4 N_A Number of columns in the global matrix If m = 0 or n = 0: N_A >= 0

Otherwise: N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

tau
See On Return.

work
has the following meaning:

If lwork = 0, work is ignored.

If lwork <> 0, work is the work area used by this subroutine, where:

Scope: local

Specified as: an area of storage containing numbers of data type indicated in Table 63.

lwork
is the number of elements in array WORK.

Scope:

Specified as: a fullword integer; where:

info
See On Return.

On Return

a
is the updated local part of the global general matrix A, containing the results of the computation.

The elements on and above the diagonal of Aia:ia+m-1, ja:ja+n-1 contain the min(m, n) × n upper trapezoidal matrix R (R is upper triangular if m >= n). The elements below the diagonal with tau represent |the matrix Q as a product of elementary reflectors.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 63. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

tau
is the updated local part of the global matrix tau, where: tauja:ja+min(m, n)-1 contains the scalar factors of the elementary reflectors.

This identifies the first element of the local array tau. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+min(m, n)-1) part of the local array tau must contain the local pieces of the leading 1 by ja+min(m, n)-1 part of the global matrix tau.

A copy of the vector tau, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of tau is distributed is CSRC_A.

Scope: local

Returned as: a 1 by (at least) LOCq(ja+min(m, n)-1) array, containing numbers of the data type indicated in Table 63.

work
is the work area used by this subroutine if lwork <> 0, where:

If lwork <> 0 and lwork <> -1, its size is (at least) of length lwork.

If lwork = -1, its size is (at least) of length 1.

Scope: local

Returned as: an area of storage, where:

If lwork >= 1 or lwork = -1, then work1 is set to the minimum lwork value and contains numbers of the data type indicated in Table 63. Except for work1, the contents of work are overwritten on return.

info
indicates that a successful computation occurred.

Scope: global

Returned as: a fullword integer; info = 0.

Notes and Coding Rules
  1. In your C program, argument info must be passed by reference.
  2. Matrix A, tau, and work must have no common elements; otherwise, results are unpredictable.
  3. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see Determining the Number of Rows and Columns in Your Local Arrays and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
  4. There is no array descriptor for tau. tau is a row-distributed vector with block size NB_A, local array of dimension 1 by LOCq(ja+min(m, n)-1), and global index ja. A copy of tau exists on each row of the process grid, and the process column over which the first column of tau is distributed is CSRC_A.
  5. For suggested block sizes, see Coding Tips for Optimizing Parallel Performance.
  6. If lwork = -1 on any process, it must equal -1 on all processes. That is, if a subset of the processes specifies -1 for the work area size, they must all specify -1.

|Function

|These subroutines compute the QR factorization of a general |matrix A.

|A = QR

|where: |

Error Conditions

Computational Errors

None

Resource Errors
  1. lwork = 0 and unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1 

  1. DTYPE_A is invalid.

Stage 2 

  1. CTXT_A is invalid.

Stage 3 

  1. This subroutine has been called from outside the process grid.

Stage 4 

  1. m < 0
  2. n < 0
  3. M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
  4. N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
  5. ia < 1
  6. ja < 1
  7. MB_A < 1
  8. NB_A < 1
  9. RSRC_A < 0 or RSRC_A >= p
  10. CSRC_A < 0 or CSRC_A >= q

Stage 5  If m <> 0 and n <> 0:

  1. ia > M_A
  2. ja > N_A
  3. ia+m-1 > M_A
  4. ja+n-1 > N_A

Stage 6 

  1. LLD_A < max(1, LOCp(M_A))
  2. lwork <> 0, lwork <> -1, and lwork < (nb (mp0 + nq0 + nb))

    where:

    mb = MB_A
    nb = NB_A
    iroff = mod(ia-1, mb)
    icoff = mod(ja-1, nb)
    iarow = mod(RSRC_A + (ia-1)/mb, nprow)
    iacol = mod(CSRC_A + (ja-1)/nb, npcol)
    mp0 = NUMROC(m+iroff, mb, myrow, iarow, nprow)
    nq0 = NUMROC(n+icoff, nb, mycol, iacol, npcol)

Stage 7 

    Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:

  1. m differs.
  2. n differs.
  3. ia differs.
  4. ja differs.
  5. DTYPE_A differs.
  6. M_A differs.
  7. N_A differs.
  8. MB_A differs.
  9. NB_A differs.
  10. RSRC_A differs.
  11. CSRC_A differs.

    Also:

  12. lwork = -1 on a subset of processes.

|Example 1

This example shows the QR factorization of a real general matrix of size 4 × 3, using a 2 × 2 process grid.

Note:
Because lwork = 0, PDGEQRF dynamically allocates the work area used by this subroutine.

Call Statements and Input


ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET(0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              M    N     A  IA  JA   DESC_A   TAU   WORK   LWORK   INFO
              |    |     |   |   |     |       |     |       |      |
CALL PDGEQRF( 4 ,  3  ,  A , 1 , 1 , DESC_A , TAU , WORK  ,  0   , INFO)


DESC_A
DTYPE_ 1
CTXT_ icontxt(IOBG55)
M_ 4
N_ 3
MB_ 1
NB_ 1
RSRC_ 0
CSRC_ 0
LLD_ See below(EPSSL55)

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

  2. Each process should set the LLD_ as follows:
    LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
    

    In this example, LLD_A = 2 on all processes.

Global general matrix A of size 4 × 3 with block sizes 1 × 1:

B,D      0         1         2
     *                           *
 0   |  1.00  |  -2.00  |  -1.00 |
     | -------|---------|------- |
 1   |  2.00  |    .00  |   1.00 |
     | -------|---------|------- |
 2   |  2.00  |  -4.00  |   2.00 |
     | -------|---------|------- |
 3   |  4.00  |    .00  |    .00 |
     *                           *

The following is the 2 × 2 process grid:

B,D  |   0 2   |   1 
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11
3    |         |

Local arrays for A:

p,q  |      0       |    1
-----|--------------|--------
 0   |  1.00 -1.00  |  -2.00
     |  2.00  2.00  |  -4.00
-----|--------------|--------
 1   |  2.00  1.00  |   0.00
     |  4.00  0.00  |   0.00

Output:

Global general matrix A of size 4 × 3 with block sizes 1 × 1:

B,D      0         1         2
     *                           *
 0   | -5.00  |   2.00  |  -1.00 |
     | -------|---------|------- |
 1   |  0.33  |  -4.00  |   1.00 |
     | -------|---------|------- |
 2   |  0.33  |  -0.50  |  -2.00 |
     | -------|---------|------- |
 3   |  0.67  |   0.50  |   0.00 |
     *                           *

The following is the 2 × 2 process grid:

B,D  |   0 2   |   1 
-----| ------- |-----
0    |   P00   |  P01
2    |         |
-----| ------- |-----
1    |   P10   |  P11
3    |         |

Local arrays for A:

p,q  |       0        |    1
-----|----------------|---------
 0   |  -5.00  -1.00  |   2.00
     |   0.33  -2.00  |  -0.50
-----|----------------|---------
 1   |   0.33   1.00  |  -4.00
     |   0.67   0.00  |   0.50

Global row vector tau of length 3 with block size of 1:

B,D      0         1         2
     *                           *
 0   |  1.20  |   1.33  |   2.00 |
     *                           *

Note:
A copy of tau is distributed across each row of the process grid.

The following is the 2 × 2 process grid:

B,D  |   0 2   |   1 
-----| ------- |-----
     |   P00   |  P01
     |         |
-----| ------- |-----
     |   P10   |  P11
     |         |

Local arrays for tau:

p,q  |       0        |    1
-----|----------------|---------
 0   |   1.20   2.00  |   1.33
-----|----------------|---------
 1   |   1.20   2.00  |   1.33

The value of info is 0 on all processes.

|Example 2

|This example shows the QR factorization of a complex general |matrix of size 3 × 4, using a 2 × 2 process |grid.

|Note:
Because lwork = 0, PZGEQRF dynamically allocates the work |area used by this subroutine. |

|Call Statements and Input
|ORDER = 'R'
|NPROW = 2
|NPCOL = 2
|CALL BLACS_GET(0, 0, ICONTXT)
|CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
|CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
| 
|              M    N     A  IA  JA   DESC_A   TAU   WORK   LWORK   INFO
|              |    |     |   |   |     |       |     |       |      |
|CALL PZGEQRF( 3 ,  4  ,  A , 1 , 1 , DESC_A , TAU , WORK  ,  0   , INFO)

|

DESC_A
DTYPE_ 1
CTXT_ icontxt(IOBG56)
M_ 3
N_ 4
MB_ 1
NB_ 1
RSRC_ 0
CSRC_ 0
LLD_ See below(EPSSL56)

Notes:

  1. icontxt is the output of the BLACS_GRIDINIT call.

  2. Each process should set the LLD_ as follows:
    LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
          = 2 on P00 and P01 and 1 on P10 and P11
    

|Global general matrix A of size 3 × 4 with block |sizes 1 × 1:

|             0               1               2               3 
|     *                                                               *
|     | --------------|---------------|-------------- |-------------- |
| 0   | ( 1.00, 0.00) | (-2.00, 1.00) | (-3.00,-1.00) | ( 4.00,-3.00) |
|     | --------------|---------------|-------------- |-------------- |
| 1   | ( 1.00,-1.00) | ( 2.00, 2.00) | (-3.00, 0.00) | (-4.00,-2.00) |
|     | --------------|---------------|-------------- |-------------- |
| 2   | ( 1.00,-2.00) | (-2.00, 3.00) | (-3.00, 1.00) | ( 4.00,-1.00) |
|     | --------------|---------------|-------------- |-------------- |
|     *                                                                *
| 

|The following is the 2 × 2 process grid:

|B,D  |   0 2   |  1 3 
|-----| ------- |-----
|0    |   P00   |  P01
|2    |         |
|-----| ------- |-----
|1    |   P10   |  P11
| 

|Local arrays for A:

|p,q  |               0                |               1
|-----|--------------------------------|-----------------------------
| 0   |  ( 1.00, 0.00) (-3.00,-1.00)   |  (-2.00, 1.00) ( 4.00,-3.00)
|     |  ( 1.00,-2.00) (-3.00, 1.00)   |  (-2.00, 3.00) ( 4.00,-1.00)
|-----|--------------------------------|-----------------------------
| 1   |  ( 1.00,-1.00) (-3.00, 0.00)   |  ( 2.00, 2.00) (-4.00,-2.00)
| 

|Output

|Global general matrix A of size 3 × 4 with block |sizes 1 × 1:

|             0               1               2               3 
|     *                                                               *
|     | --------------|---------------|-------------- |-------------- |
| 0   | (-2.83, 0.00) | ( 3.54,-1.41) | ( 3.89, 3.18) | (-2.83, 0.71) |
|     | --------------|---------------|-------------- |-------------- |
| 1   | ( 0.26,-0.26) | (-3.39, 0.00) | ( 0.37,-0.37) | ( 6.78, 0.74) |
|     | --------------|---------------|-------------- |-------------- |
| 2   | ( 0.26,-0.52) | (-0.29,-0.09) | (-1.87, 0.00) | ( 1.87,-1.87) |
|     | --------------|---------------|-------------- |-------------- |
|     *                                                                *
| 

|The following is the 2 × 2 process grid:

|B,D  |   0 2   |  1 3 
|-----| ------- |-----
|0    |   P00   |  P01
|2    |         |
|-----| ------- |-----
|1    |   P10   |  P11
| 

|Local arrays for A:

|p,q  |               0                |               1
|-----|--------------------------------|-----------------------------
| 0   |  (-2.83, 0.00) ( 3.89, 3.18)   |  ( 3.54,-1.41) (-2.83, 0.71)
|     |  ( 0.26,-0.52) (-1.87, 0.00)   |  (-0.29,-0.09) ( 1.87,-1.87)
|-----|--------------------------------|-----------------------------
| 1   |  ( 0.26,-0.26) ( 0.37,-0.37)   |  (-3.39, 0.00) ( 6.78, 0.74)
| 

|Global row vector tau of length 3 with block size of 1:

|          0               1               2 
| *                                               *
| | --------------|---------------|-------------- |
| | ( 1.35, 0.00) | ( 1.83,-0.02) | ( 1.47,-0.88) |
| | --------------|---------------|-------------- |
| *                                                *
| 

|The following is the 2 × 2 process grid.

|Note:
A copy of tau is distributed across each row of the process |grid. |
|B,D  |   0 2   |   1 
|-----| ------- |-----
|     |   P00   |  P01
|-----| ------- |-----
|     |   P10   |  P11
| 

|Local arrays for tau:

|p,q  |               0                |        1
|-----|--------------------------------|---------------
| 0   |  ( 1.35, 0.00) ( 1.47,-0.88)   |  ( 1.83,-0.02)
|-----|--------------------------------|---------------
| 1   |  ( 1.35, 0.00) ( 1.47,-0.88)   |  ( 1.83,-0.02)
| 

|The value of info is 0 on all processes.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]