Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

PDGETRF and PZGETRF--General Matrix Factorization

These subroutines factor general matrix A using Gaussian elimination with partial pivoting, ipvt, to compute the LU factorization of A, where, in this description:

A represents the global general submatrix A_{ia:ia+m-1,
ja:ja+n-1} to be factored.

ipvt represents the global vector ipvt_ia:ia+m-1 containing the pivoting indices.

L is a lower triangular matrix.

U is an upper triangular matrix.

On output, the transformed matrix A contains U in the upper triangle (if m >= n) or upper trapezoid (if m < n) and L in the strict lower triangle (if m <= n) or lower trapezoid (if m > n). ipvt contains the pivots representing permutation P, such that A = PLU.

To solve the system of equations with any number of right-hand sides, follow the call to these subroutines with one or more calls to PDGETRS or PZGETRS, respectively.

If m = 0 or n = 0, no computation is performed and the subroutine returns after doing some parameter checking. See references [16], [18], [22], [36], and [37].

Table 59. Data Types

A	*ipvt*	Subroutine
Long-precision real	Integer	PDGETRF
Long-precision complex	Integer	PZGETRF

Syntax

Fortran	CALL PDGETRF \| PZGETRF (`m`, `n`, `a`, `ia`, `ja`, `desc_a`, `ipvt`, `info`)
C and C++	pdgetrf \| pzgetrf (`m`, `n`, `a`, `ia`, `ja`, `desc_a`, `ipvt`, `info`);

On Entry

m

is the number of rows in submatrix A and the number of elements in vector ipvt used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

a

is the local part of the global general matrix A, used in the system of equations. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+m-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+m-1 by ja+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 59. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

ipvt

See On Return.

info

See On Return.

On Return

a

is the updated local part of the global matrix A, containing the results of the factorization.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 59.

ipvt

is the local part of the global vector ipvt, containing the pivot indices. This identifies the first element of the local array IPVT. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, p, and myrow; therefore, the leading LOCp(ia+m-1) part of the local array IPVT must contain the local pieces of the leading ia+m-1 part of the global vector.

A copy of the vector ipvt, with a block size of MB_A and global index ia, is returned to each column of the process grid. The process row over which the first row of ipvt is distributed is RSRC_A.

Scope: local

Returned as: an array of (at least) length LOCp(ia+m-1), containing fullword integers, where ia <= (pivoting indices) <= ia+m-1. Details about the block-cyclic data distribution of global vector ipvt are stored in desc_a.

info

has the following meaning:

If info = 0, global submatrix A is not singular, and the factorization completed normally.

If info > 0, global submatrix A is singular; that is, one or more columns of L and the corresponding diagonal of U contain all zeros. All columns of L are checked. info is set equal to i, the first column of L with a corresponding zero diagonal element, encountered at A_{ia+i-1,
ja+i-1}. The factorization is completed; however, if you call PDGETRS/PZGETRS or PDGETRI/PZGETRI with these factors, results are unpredictable.

Scope: global

Returned as: a fullword integer; info >= 0.

Notes and Coding Rules

In your C program, argument info must be passed by reference.
The matrix and vector must have no common elements; otherwise, results are unpredictable.
The scalar data specified for input argument n must be the same for both PDGETRF/PZGETRF and PDGETRS/PZGETRS. In addition, the scalar data specified for input argument m in PDGETRF/PZGETRF must be the same as input argument n in both PDGETRF/PZGETRF and PDGETRS/PZGETRS.
If, however, you do not plan to call PDGETRS/PZGETRS after calling PDGETRF/PZGETRF, then input arguments m and n in PDGETRF/PZGETRF do not need to be equal.
The global submatrices for A and ipvt input to PDGETRS/PZGETRS must be the same as for the corresponding output arguments for PDGETRF/PZGETRF; and thus, the scalar data specified for ia, ja, and the contents of desc_a must also be the same.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see Determining the Number of Rows and Columns in Your Local Arrays and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
The way these subroutines handle singularity differs from ScaLAPACK. These subroutines use the info argument to provide information about the singularity of A, like ScaLAPACK, but also provide an error message.
On both input and output, matrix A conforms to ScaLAPACK format.
The global general matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The global general matrix A must be aligned on a block row boundary; that is, ia-1 must be a multiple of MB_A.
The block row offset of A must be equal to the block column offset of A; that is, mod(ia-1, MB_A) = mod(ja-1, NB_A).
There is no array descriptor for ipvt. It is a column-distributed vector with block size MB_A, local arrays of dimension LOCp(ia+m-1) by 1, and global index ia. A copy of this vector exists on each column of the process grid, and the process row over which the first column of ipvt is distributed is RSRC_A.

Performance Considerations

For suggested block sizes, see Coding Tips for Optimizing Parallel Performance.
Pivoting imposes additional communication requirements over the process grid columns; therefore, you achieve optimal performance by using a process grid with p < q. On the other hand, a p × 1 grid provides the worse possible configuration.
For optimal performance, take the following items into consideration when choosing the NB_A (= MB_A) value:
- The cache size of the computational nodes. NB_A determines the granularity of the most expensive part of the computation, which tends to increase the optimal value of NB_A.
- The communication and synchronization overhead. This has two aspects, the cost of internal synchronization points and the cost of broadcasts. These tend to slightly decrease the optimal value of NB_A.
- The model of communication adapter you are using.
- Load balancing. For the best processor utilization, it is necessary for the processor nodes to be active for as long as possible; therefore, each one should have as many blocks as possible. For a given problem size, this tends to decrease the optimal value of NB_A (best load balancing: 1) and is most relevant at very small problem sizes.
- If NB_A is equal to a power of 2, performance may be degraded.
- Use the following rules of thumb for reasonably-sized problems:
  - For the SERIAL processors, choose NB_A in the following range:
    - For PDGETRF, use [30, 50], avoiding 32.
    - For PZGETRF, use [10, 25], avoiding 16.
  - For the SMP processors, choose NB_A in the following range:
    - For PDGETRF, use [70, 100].
    - For PZGETRF, use [30, 50], avoiding 32.

Error Conditions

Computational Errors

Matrix A is a singular matrix. For details, see the description of the info argument.

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1:

DTYPE_A is invalid.

Stage 2:

CTXT_A is invalid.

Stage 3:

This subroutine was called from outside the process grid.

Stage 4:

m < 0
n < 0
M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
ia < 1
ja < 1
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q

Stage 5:

If m <> 0 and n <> 0:

ia > M_A
ja > N_A
ia+m-1 > M_A
ja+n-1 > N_A

In all cases:
MB_A <> NB_A
mod(ia-1, MB_A) <> mod(ja-1, NB_A)
mod(ia-1, MB_A) <> 0

Stage 6:

LLD_A < max(1, LOCp(M_A))

Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P₀₀:
m differs.
n differs.
ia differs.
ja differs.
DTYPE_A differs.
M_A differs.
N_A differs.
MB_A differs.
NB_A differs.
RSRC_A differs.
CSRC_A differs.

Example 1

This example factors a 9 × 9 real general matrix using a 2 × 2 process grid. By specifying RSRC_A = 1, the rows of global matrix A and the elements of global vector ipvt are distributed over the process grid starting in the second row of the process grid.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              M    N   A  IA  JA    DESC_A   IPVT   INFO
              |    |   |   |   |      |       |      |
CALL PDGETRF( 9  , 9 , A , 1 , 1 ,  DESC_A , IPVT , INFO )

	Desc_A
DTYPE_	1
CTXT_	`icontxt`^(IOBG44)
M_	9
N_	9
MB_	3
NB_	3
RSRC_	1
CSRC_	0
LLD_	See below^(EPSSL44)
Notes: `icontxt` is the output of the BLACS_GRIDINIT call. Each process should set the LLD_ as follows: LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) In this example, LLD_A = 3 on P₀₀ and P₀₁, and LLD_A = 6 on P₁₀ and P₁₁.

Global general 9 × 9 matrix A with block size 3 × 3:

B,D          0                  1                  2
     *                                                      *
     |  1.0  1.2  1.4  |   1.6  1.8  2.0  |   2.2  2.4  2.6 |
 0   |  1.2  1.0  1.2  |   1.4  1.6  1.8  |   2.0  2.2  2.4 |
     |  1.4  1.2  1.0  |   1.2  1.4  1.6  |   1.8  2.0  2.2 |
     | ----------------|------------------|---------------- |
     |  1.6  1.4  1.2  |   1.0  1.2  1.4  |   1.6  1.8  2.0 |
 1   |  1.8  1.6  1.4  |   1.2  1.0  1.2  |   1.4  1.6  1.8 |
     |  2.0  1.8  1.6  |   1.4  1.2  1.0  |   1.2  1.4  1.6 |
     | ----------------|------------------|---------------- |
     |  2.2  2.0  1.8  |   1.6  1.4  1.2  |   1.0  1.2  1.4 |
 2   |  2.4  2.2  2.0  |   1.8  1.6  1.4  |   1.2  1.0  1.2 |
     |  2.6  2.4  2.2  |   2.0  1.8  1.6  |   1.4  1.2  1.0 |
     *                                                      *

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note:: The first row of A begins in the second row of the process grid.

Local arrays for A:

p,q  |               0                |        1
-----|--------------------------------|-----------------
     |  1.6  1.4  1.2  1.6  1.8  2.0  |   1.0  1.2  1.4
 0   |  1.8  1.6  1.4  1.4  1.6  1.8  |   1.2  1.0  1.2
     |  2.0  1.8  1.6  1.2  1.4  1.6  |   1.4  1.2  1.0
-----|--------------------------------|-----------------
     |  1.0  1.2  1.4  2.2  2.4  2.6  |   1.6  1.8  2.0
     |  1.2  1.0  1.2  2.0  2.2  2.4  |   1.4  1.6  1.8
     |  1.4  1.2  1.0  1.8  2.0  2.2  |   1.2  1.4  1.6
 1   |  2.2  2.0  1.8  1.0  1.2  1.4  |   1.6  1.4  1.2
     |  2.4  2.2  2.0  1.2  1.0  1.2  |   1.8  1.6  1.4
     |  2.6  2.4  2.2  1.4  1.2  1.0  |   2.0  1.8  1.6

Output:

Global general 9 × 9 transformed matrix A with block size 3 × 3:

B,D          0                  1                  2
     *                                                      *
     |  2.6  2.4  2.2  |   2.0  1.8  1.6  |   1.4  1.2  1.0 |
 0   |  0.4  0.3  0.6  |   0.8  1.1  1.4  |   1.7  1.9  2.2 |
     |  0.5 -0.4  0.4  |   0.8  1.2  1.6  |   2.0  2.4  2.8 |
     | ----------------|------------------|---------------- |
     |  0.5 -0.3  0.0  |   0.4  0.8  1.2  |   1.6  2.0  2.4 |
 1   |  0.6 -0.3  0.0  |   0.0  0.4  0.8  |   1.2  1.6  2.0 |
     |  0.7 -0.2  0.0  |   0.0  0.0  0.4  |   0.8  1.2  1.6 |
     | ----------------|------------------|---------------- |
     |  0.8 -0.2  0.0  |   0.0  0.0  0.0  |   0.4  0.8  1.2 |
 2   |  0.8 -0.1  0.0  |   0.0  0.0  0.0  |   0.0  0.4  0.8 |
     |  0.9 -0.1  0.0  |   0.0  0.0  0.0  |   0.0  0.0  0.4 |
     *                                                      *

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note:: The first row of A begins in the second row of the process grid.

Local arrays for A:

p,q  |               0                |        1
-----|--------------------------------|-----------------
     |  0.5 -0.3  0.0  1.6  2.0  2.4  |   0.4  0.8  1.2
 0   |  0.6 -0.3  0.0  1.2  1.6  2.0  |   0.0  0.4  0.8
     |  0.7 -0.2  0.0  0.8  1.2  1.6  |   0.0  0.0  0.4
-----|--------------------------------|-----------------
     |  2.6  2.4  2.2  1.4  1.2  1.0  |   2.0  1.8  1.6
     |  0.4  0.3  0.6  1.7  1.9  2.2  |   0.8  1.1  1.4
     |  0.5 -0.4  0.4  2.0  2.4  2.8  |   0.8  1.2  1.6
 1   |  0.8 -0.2  0.0  0.4  0.8  1.2  |   0.0  0.0  0.0
     |  0.8 -0.1  0.0  0.0  0.4  0.8  |   0.0  0.0  0.0
     |  0.9 -0.1  0.0  0.0  0.0  0.4  |   0.0  0.0  0.0

Global vector ipvt of length 9 with block size 3:

B,D    0
     *    *
     |  9 |
 0   |  9 |
     |  9 |
     | -- |
     |  9 |
 1   |  9 |
     |  9 |
     | -- |
     |  9 |
 2   |  9 |
     |  9 |
     *    *

Note:: A copy of ipvt is distributed across each column of the process grid.

The following is the 2 × 2 process grid:

B,D  |       |   
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note:: The first row of ipvt begins in the second row of the process grid.

Local arrays for ipvt:

p,q  |  0  |   1
-----|-----|-----
     |  9  |   9
 0   |  9  |   9
     |  9  |   9
-----|-----|-----
     |  9  |   9
     |  9  |   9
     |  9  |   9
 1   |  9  |   9
     |  9  |   9
     |  9  |   9

The value of info is 0 on all processes.

Example 2

This example factors a 9 × 9 complex matrix using a 2 × 2 process grid. By specifying RSRC_A = 1, the rows of global matrix A and the elements of global vector ipvt are distributed over the process grid starting in the second row of the process grid.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              M    N   A  IA  JA    DESC_A   IPVT   INFO
              |    |   |   |   |      |       |      |
CALL PZGETRF( 9  , 9 , A , 1 , 1 ,  DESC_A , IPVT , INFO )

	Desc_A
DTYPE_	1
CTXT_	`icontxt`^(IOBG46)
M_	9
N_	9
MB_	3
NB_	3
RSRC_	1
CSRC_	0
LLD_	See below^(EPSSL46)
Notes: `icontxt` is the output of the BLACS_GRIDINIT call. Each process should set the LLD_ as follows: LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) In this example, LLD_A = 3 on P₀₀ and P₀₁, and LLD_A = 6 on P₁₀ and P₁₁.

Global general 9 × 9 matrix A with block size 3 × 3:

B,D                     0                                       1                                       2
     *                                                                                                                     *
     |  (2.0, 1.0)  (2.4,-1.0)  (2.8,-1.0)  |   (3.2,-1.0)  (3.6,-1.0)  (4.0,-1.0)  |   (4.4,-1.0)  (4.8,-1.0)  (5.2,-1.0) |
 0   |  (2.4, 1.0)  (2.0, 1.0)  (2.4,-1.0)  |   (2.8,-1.0)  (3.2,-1.0)  (3.6,-1.0)  |   (4.0,-1.0)  (4.4,-1.0)  (4.8,-1.0) |
     |  (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0)  |   (2.4,-1.0)  (2.8,-1.0)  (3.2,-1.0)  |   (3.6,-1.0)  (4.0,-1.0)  (4.4,-1.0) |
     | -------------------------------------|---------------------------------------|------------------------------------- |
     |  (3.2, 1.0)  (2.8, 1.0)  (2.4, 1.0)  |   (2.0, 1.0)  (2.4,-1.0)  (2.8,-1.0)  |   (3.2,-1.0)  (3.6,-1.0)  (4.0,-1.0) |
 1   |  (3.6, 1.0)  (3.2, 1.0)  (2.8, 1.0)  |   (2.4, 1.0)  (2.0, 1.0)  (2.4,-1.0)  |   (2.8,-1.0)  (3.2,-1.0)  (3.6,-1.0) |
     |  (4.0, 1.0)  (3.6, 1.0)  (3.2, 1.0)  |   (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0)  |   (2.4,-1.0)  (2.8,-1.0)  (3.2,-1.0) |
     | -------------------------------------|---------------------------------------|------------------------------------- |
     |  (4.4, 1.0)  (4.0, 1.0)  (3.6, 1.0)  |   (3.2, 1.0)  (2.8, 1.0)  (2.4, 1.0)  |   (2.0, 1.0)  (2.4,-1.0)  (2.8,-1.0) |
 2   |  (4.8, 1.0)  (4.4, 1.0)  (4.0, 1.0)  |   (3.6, 1.0)  (3.2, 1.0)  (2.8, 1.0)  |   (2.4, 1.0)  (2.0, 1.0)  (2.4,-1.0) |
     |  (5.2, 1.0)  (4.8, 1.0)  (4.4, 1.0)  |   (4.0, 1.0)  (3.6, 1.0)  (3.2, 1.0)  |   (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0) |
     *                                                                                                                     *

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note:: The first row of A begins in the second row of the process grid.

Local arrays for A:

p,q  |                                    0                                     |                   1
-----|--------------------------------------------------------------------------|--------------------------------------
     |  (3.2, 1.0)  (2.8, 1.0)  (2.4, 1.0)  (3.2,-1.0)  (3.6,-1.0)  (4.0,-1.0)  |   (2.0, 1.0)  (2.4,-1.0)  (2.8,-1.0)
 0   |  (3.6, 1.0)  (3.2, 1.0)  (2.8, 1.0)  (2.8,-1.0)  (3.2,-1.0)  (3.6,-1.0)  |   (2.4, 1.0)  (2.0, 1.0)  (2.4,-1.0)
     |  (4.0, 1.0)  (3.6, 1.0)  (3.2, 1.0)  (2.4,-1.0)  (2.8,-1.0)  (3.2,-1.0)  |   (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0)
-----|--------------------------------------------------------------------------|--------------------------------------
     |  (2.0, 1.0)  (2.4,-1.0)  (2.8,-1.0)  (4.4,-1.0)  (4.8,-1.0)  (5.2,-1.0)  |   (3.2,-1.0)  (3.6,-1.0)  (4.0,-1.0)
     |  (2.4, 1.0)  (2.0, 1.0)  (2.4,-1.0)  (4.0,-1.0)  (4.4,-1.0)  (4.8,-1.0)  |   (2.8,-1.0)  (3.2,-1.0)  (3.6,-1.0)
     |  (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0)  (3.6,-1.0)  (4.0,-1.0)  (4.4,-1.0)  |   (2.4,-1.0)  (2.8,-1.0)  (3.2,-1.0)
 1   |  (4.4, 1.0)  (4.0, 1.0)  (3.6, 1.0)  (2.0, 1.0)  (2.4,-1.0)  (2.8,-1.0)  |   (3.2, 1.0)  (2.8, 1.0)  (2.4, 1.0)
     |  (4.8, 1.0)  (4.4, 1.0)  (4.0, 1.0)  (2.4, 1.0)  (2.0, 1.0)  (2.4,-1.0)  |   (3.6, 1.0)  (3.2, 1.0)  (2.8, 1.0)
     |  (5.2, 1.0)  (4.8, 1.0)  (4.4, 1.0)  (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0)  |   (4.0, 1.0)  (3.6, 1.0)  (3.2, 1.0)

Output:

Global general 9 × 9 transformed matrix A with block size 3 × 3:

B,D                     0                                         1                                        2
     *                                                                                                                        *
     |  (5.2, 1.0)  (4.8, 1.0)   (4.4, 1.0)  |    (4.0, 1.0)   (3.6, 1.0)  (3.2, 1.0)  |   (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0) |
 0   |  (0.4, 0.1)  (0.6,-2.0)   (1.1,-1.9)  |    (1.7,-1.9)   (2.3,-1.8)  (2.8,-1.8)  |   (3.4,-1.7)  (3.9,-1.7)  (4.5,-1.6) |
     |  (0.5, 0.1)  (0.0,-0.1)   (0.6,-1.9)  |    (1.2,-1.8)   (1.8,-1.7)  (2.5,-1.6)  |   (3.1,-1.5)  (3.7,-1.4)  (4.3,-1.3) |
     | --------------------------------------|-----------------------------------------|------------------------------------- |
     |  (0.6, 0.1)  (0.0,-0.1)  (-0.1,-0.1)  |    (0.7,-1.9)   (1.3,-1.7)  (2.0,-1.6)  |   (2.7,-1.5)  (3.4,-1.4)  (4.0,-1.2) |
 1   |  (0.6, 0.1)  (0.0,-0.1)  (-0.1,-0.1)  |   (-0.1, 0.0)   (0.7,-1.9)  (1.5,-1.7)  |   (2.2,-1.6)  (2.9,-1.5)  (3.7,-1.3) |
     |  (0.7, 0.1)  (0.0,-0.1)   (0.0, 0.0)  |   (-0.1, 0.0)  (-0.1, 0.0)  (0.8,-1.9)  |   (1.6,-1.8)  (2.4,-1.6)  (3.2,-1.5) |
     | --------------------------------------|-----------------------------------------|------------------------------------- |
     |  (0.8, 0.0)  (0.0, 0.0)   (0.0, 0.0)  |    (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)  |   (0.8,-1.9)  (1.7,-1.8)  (2.5,-1.8) |
 2   |  (0.9, 0.0)  (0.0, 0.0)   (0.0, 0.0)  |    (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)  |   (0.0, 0.0)  (0.8,-2.0)  (1.7,-1.9) |
     |  (0.9, 0.0)  (0.0, 0.0)   (0.0, 0.0)  |    (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)  |   (0.0, 0.0)  (0.0, 0.0)  (0.8,-2.0) |
     *                                                                                                                        *

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note:: The first row of A begins in the second row of the process grid.

Local arrays for A:

p,q  |                                    0                                      |                    1
-----|---------------------------------------------------------------------------|----------------------------------------
     |  (0.6, 0.1)  (0.0,-0.1)  (-0.1,-0.1)  (2.7,-1.5)  (3.4,-1.4)  (4.0,-1.2)  |    (0.7,-1.9)   (1.3,-1.7)  (2.0,-1.6)
 0   |  (0.6, 0.1)  (0.0,-0.1)  (-0.1,-0.1)  (2.2,-1.6)  (2.9,-1.5)  (3.7,-1.3)  |   (-0.1, 0.0)   (0.7,-1.9)  (1.5,-1.7)
     |  (0.7, 0.1)  (0.0,-0.1)   (0.0, 0.0)  (1.6,-1.8)  (2.4,-1.6)  (3.2,-1.5)  |   (-0.1, 0.0)  (-0.1, 0.0)  (0.8,-1.9)
-----|---------------------------------------------------------------------------|----------------------------------------
     |  (5.2, 1.0)  (4.8, 1.0)   (4.4, 1.0)  (2.8, 1.0)  (2.4, 1.0)  (2.0, 1.0)  |    (4.0, 1.0)   (3.6, 1.0)  (3.2, 1.0)
     |  (0.4, 0.1)  (0.6,-2.0)   (1.1,-1.9)  (3.4,-1.7)  (3.9,-1.7)  (4.5,-1.6)  |    (1.7,-1.9)   (2.3,-1.8)  (2.8,-1.8)
     |  (0.5, 0.1)  (0.0,-0.1)   (0.6,-1.9)  (3.1,-1.5)  (3.7,-1.4)  (4.3,-1.3)  |    (1.2,-1.8)   (1.8,-1.7)  (2.5,-1.6)
 1   |  (0.8, 0.0)  (0.0, 0.0)   (0.0, 0.0)  (0.8,-1.9)  (1.7,-1.8)  (2.5,-1.8)  |    (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)
     |  (0.9, 0.0)  (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)  (0.8,-2.0)  (1.7,-1.9)  |    (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)
     |  (0.9, 0.0)  (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)  (0.8,-2.0)  |    (0.0, 0.0)   (0.0, 0.0)  (0.0, 0.0)

Global vector ipvt of length 9 with block size 3:

Note:: A copy of ipvt is distributed across each column of the process grid.

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note:: The first row of ipvt begins in the second row of the process grid.

Local arrays for ipvt:

p,q  |  0  |   1
-----|-----|-----
     |  9  |   9
 0   |  9  |   9
     |  9  |   9
-----|-----|-----
     |  9  |   9
     |  9  |   9
     |  9  |   9
 1   |  9  |   9
     |  9  |   9
     |  9  |   9

The value of info is 0 on all processes.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]