These subroutines compute one of the following matrix-matrix products:
where, in the formulas above:
and:
In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:
alpha, beta, A, B, C | Subprogram |
Long-precision real | PDSYMM |
Long-precision complex | PZSYMM and PZHEMM |
Fortran | CALL PDSYMM | PZSYMM | PZHEMM (side, uplo, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c) |
C and C++ | pdsymm | pzsymm | pzhemm (side, uplo, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c); |
If side = 'L', A is to the left of B, resulting in equation 1.
If side = 'R', A is to the right of B, resulting in equation 2.
Scope: global
Specified as: a single character; side = 'L' or 'R'.
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
If side = 'L', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; m >= 0.
If side = 'R', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 48.
If side = 'L', numa = m
If side = 'R', numa = n
the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If m = 0 and side = 'L' or n = 0 and side = 'R': M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If m = 0 and side = 'L' or n = 0 and side = 'R': N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.
Scope: global
Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.
Scope: global
Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.
desc_b | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_B | Descriptor type | DTYPE_B=1 | Global |
2 | CTXT_B | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_B | Number of rows in the global matrix |
If m = 0 or n = 0: M_B >= 0 Otherwise: M_B >= 1 | Global |
4 | N_B | Number of columns in the global matrix |
If m = 0 or n = 0: N_B >= 0 Otherwise: N_B >= 1 | Global |
5 | MB_B | Row block size | MB_B >= 1 | Global |
6 | NB_B | Column block size | NB_B >= 1 | Global |
7 | RSRC_B | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_B < p | Global |
8 | CSRC_B | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_B < q | Global |
9 | LLD_B | The leading dimension of the local array | LLD_B >= max(1,LOCp(M_B)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a number of the data type indicated in Table 48.
When beta is zero, C need not be set on input.
Scope: local
Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.
Scope: global
Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.
Scope: global
Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.
desc_c | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_C | Descriptor type | DTYPE_C=1 | Global |
2 | CTXT_C | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_C | Number of rows in the global matrix |
If m = 0 or n = 0: M_C >= 0 Otherwise: M_C >= 1 | Global |
4 | N_C | Number of columns in the global matrix |
If m = 0 or n = 0: N_C >= 0 Otherwise: N_C >= 1 | Global |
5 | MB_C | Row block size | MB_C >= 1 | Global |
6 | NB_C | Column block size | NB_C >= 1 | Global |
7 | RSRC_C | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_C < p | Global |
8 | CSRC_C | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_C < q | Global |
9 | LLD_C | The leading dimension of the local array | LLD_C >= max(1,LOCp(M_C)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48.
then:
then:
then you must follow these rules:
or if all the following are true:
then you must follow these rules:
None
Unable to allocate work space
Stage 5: If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):
where numa = m if side = 'L' and numa = n if side = 'R'.
If m <> 0 and n <> 0:
Stage 6: If A is contained within a single block, that is:
and:
then:
If A is not contained within a single block, or if A is contained within a single block and:
then:
If side = 'L':
If side = 'R':
In all cases:
If side = 'L' and looping is required--that is, either of the following is true:
then:
If side = 'L':
If side = 'R' and looping is required--that is, either of the following is true:
then:
If side = 'R':
This example computes C = betaC+alphaBA using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) SIDE UPLO M N ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | CALL PDSYMM( 'R' , 'U' , 16 , 8 , 1.0D0 , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , 0.0D0 , C , 1 , 1 , DESC_C )
| Desc_A | Desc_B | Desc_C |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBG20) | icontxt(IOBG20) | icontxt(IOBG20) |
M_ | 8 | 16 | 16 |
N_ | 8 | 8 | 8 |
MB_ | 2 | 4 | 4 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSL20) | See below(EPSSL20) | See below(EPSSL20) |
Notes:
|
Global real symmetric matrix A of order 8 with block size 2 × 2:
B,D 0 1 2 3 * * 0 | 0.0 -1.0 | -1.0 0.0 | 0.0 0.0 | 0.0 0.0 | | . 1.0 | 0.0 1.0 | 0.0 1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | 1 | . . | -1.0 -1.0 | 0.0 0.0 | 1.0 0.0 | | . . | . -1.0 | 1.0 1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | 2 | . . | . . | -1.0 0.0 | 0.0 0.0 | | . . | . . | . 1.0 | 0.0 0.0 | | -----------|-------------|-------------|----------- | 3 | . . | . . | . . | 0.0 0.0 | | . . | . . | . . | . 0.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for A:
p,q | 0 | 1 -----|----------------------|---------------------- | 0.0 -1.0 0.0 0.0 | -1.0 0.0 0.0 0.0 | . 1.0 0.0 1.0 | 0.0 1.0 0.0 1.0 0 | . . -1.0 0.0 | . . 0.0 0.0 | . . . 1.0 | . . 0.0 0.0 -----|----------------------|---------------------- | . . 0.0 0.0 | -1.0 -1.0 1.0 0.0 | . . 1.0 1.0 | . -1.0 0.0 1.0 1 | . . . . | . . 0.0 0.0 | . . . . | . . . 0.0
Global general 16 × 8 matrix B with block size 4 × 2:
B,D 0 1 2 3 * * | -1.0 0.0 | 1.0 -1.0 | 1.0 1.0 | -1.0 -1.0 | | -1.0 -1.0 | 1.0 0.0 | 1.0 -1.0 | -1.0 1.0 | 0 | 1.0 1.0 | -1.0 0.0 | -1.0 0.0 | 1.0 0.0 | | 0.0 -1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 -1.0 | | -----------|-------------|-------------|----------- | | 0.0 1.0 | 0.0 1.0 | 0.0 1.0 | 1.0 0.0 | | 0.0 0.0 | 1.0 0.0 | -1.0 -1.0 | 0.0 0.0 | 1 | 1.0 1.0 | 0.0 0.0 | 1.0 1.0 | 0.0 -1.0 | | 0.0 0.0 | -1.0 0.0 | 0.0 1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | | 0.0 0.0 | 0.0 -1.0 | 1.0 1.0 | 0.0 1.0 | | -1.0 -1.0 | 1.0 0.0 | 0.0 -1.0 | 0.0 1.0 | 2 | 0.0 0.0 | 0.0 1.0 | 1.0 0.0 | 0.0 0.0 | | 0.0 0.0 | 1.0 1.0 | 0.0 -1.0 | 0.0 0.0 | | -----------|-------------|-------------|----------- | | 1.0 1.0 | -1.0 0.0 | -1.0 -1.0 | 1.0 1.0 | | 0.0 0.0 | 0.0 0.0 | 1.0 0.0 | 0.0 -1.0 | 3 | 0.0 1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | | -1.0 0.0 | -1.0 0.0 | 0.0 1.0 | 1.0 0.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for B:
p,q | 0 | 1 -----|----------------------|---------------------- | -1.0 0.0 1.0 1.0 | 1.0 -1.0 -1.0 -1.0 | -1.0 -1.0 1.0 -1.0 | 1.0 0.0 -1.0 1.0 | 1.0 1.0 -1.0 0.0 | -1.0 0.0 1.0 0.0 | 0.0 -1.0 0.0 0.0 | 0.0 0.0 0.0 -1.0 0 | 0.0 0.0 1.0 1.0 | 0.0 -1.0 0.0 1.0 | -1.0 -1.0 0.0 -1.0 | 1.0 0.0 0.0 1.0 | 0.0 0.0 1.0 0.0 | 0.0 1.0 0.0 0.0 | 0.0 0.0 0.0 -1.0 | 1.0 1.0 0.0 0.0 -----|----------------------|---------------------- | 0.0 1.0 0.0 1.0 | 0.0 1.0 1.0 0.0 | 0.0 0.0 -1.0 -1.0 | 1.0 0.0 0.0 0.0 | 1.0 1.0 1.0 1.0 | 0.0 0.0 0.0 -1.0 | 0.0 0.0 0.0 1.0 | -1.0 0.0 0.0 1.0 1 | 1.0 1.0 -1.0 -1.0 | -1.0 0.0 1.0 1.0 | 0.0 0.0 1.0 0.0 | 0.0 0.0 0.0 -1.0 | 0.0 1.0 0.0 0.0 | 0.0 0.0 0.0 0.0 | -1.0 0.0 0.0 1.0 | -1.0 0.0 1.0 0.0
Output:
Global general 16 × 8 matrix C with block size 4 × 2:
B,D 0 1 2 3 * * | -1.0 0.0 | 0.0 1.0 | -2.0 0.0 | 1.0 -1.0 | | 0.0 0.0 | -1.0 -1.0 | -1.0 -2.0 | 1.0 -1.0 | 0 | 0.0 0.0 | 1.0 1.0 | 1.0 1.0 | -1.0 1.0 | | 1.0 -2.0 | 0.0 -2.0 | 0.0 -1.0 | 0.0 -1.0 | | -----------|-------------|-------------|----------- | | -1.0 3.0 | 0.0 1.0 | 1.0 3.0 | 0.0 2.0 | | -1.0 -1.0 | -1.0 -3.0 | 1.0 -1.0 | 1.0 0.0 | 1 | -1.0 0.0 | -1.0 2.0 | -1.0 2.0 | 0.0 1.0 | | 1.0 2.0 | 1.0 3.0 | 0.0 1.0 | -1.0 0.0 | | -----------|-------------|-------------|----------- | | 0.0 1.0 | 1.0 4.0 | -2.0 0.0 | 0.0 -1.0 | | 0.0 0.0 | 0.0 -2.0 | 0.0 -2.0 | 1.0 -1.0 | 2 | 0.0 1.0 | -1.0 0.0 | 0.0 1.0 | 0.0 1.0 | | -1.0 0.0 | -2.0 -3.0 | 1.0 0.0 | 1.0 1.0 | | -----------|-------------|-------------|----------- | | 0.0 0.0 | 1.0 1.0 | 1.0 0.0 | -1.0 1.0 | | 0.0 -1.0 | 0.0 0.0 | -1.0 0.0 | 0.0 0.0 | 3 | -1.0 1.0 | 0.0 1.0 | 0.0 1.0 | 0.0 1.0 | | 1.0 2.0 | 3.0 2.0 | 0.0 1.0 | -1.0 0.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for C:
p,q | 0 | 1 -----|----------------------|---------------------- | -1.0 0.0 -2.0 0.0 | 0.0 1.0 1.0 -1.0 | 0.0 0.0 -1.0 -2.0 | -1.0 -1.0 1.0 -1.0 | 0.0 0.0 1.0 1.0 | 1.0 1.0 -1.0 1.0 | 1.0 -2.0 0.0 -1.0 | 0.0 -2.0 0.0 -1.0 0 | 0.0 1.0 -2.0 0.0 | 1.0 4.0 0.0 -1.0 | 0.0 0.0 0.0 -2.0 | 0.0 -2.0 1.0 -1.0 | 0.0 1.0 0.0 1.0 | -1.0 0.0 0.0 1.0 | -1.0 0.0 1.0 0.0 | -2.0 -3.0 1.0 1.0 -----|----------------------|---------------------- | -1.0 3.0 1.0 3.0 | 0.0 1.0 0.0 2.0 | -1.0 -1.0 1.0 -1.0 | -1.0 -3.0 1.0 0.0 | -1.0 0.0 -1.0 2.0 | -1.0 2.0 0.0 1.0 | 1.0 2.0 0.0 1.0 | 1.0 3.0 -1.0 0.0 1 | 0.0 0.0 1.0 0.0 | 1.0 1.0 -1.0 1.0 | 0.0 -1.0 -1.0 0.0 | 0.0 0.0 0.0 0.0 | -1.0 1.0 0.0 1.0 | 0.0 1.0 0.0 1.0 | 1.0 2.0 0.0 1.0 | 3.0 2.0 -1.0 0.0
This example computes C = betaC+alphaBA using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) SIDE UPLO M N ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | CALL PZSYMM( 'R' , 'U' , 16 , 8 , ALPHA , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , BETA , C , 1 , 1 , DESC_C ) ALPHA = (1.0, 2.0) BETA = (0.0, 0.0)
| Desc_A | Desc_B | Desc_C |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBG21) | icontxt(IOBG21) | icontxt(IOBG21) |
M_ | 8 | 16 | 16 |
N_ | 8 | 8 | 8 |
MB_ | 2 | 4 | 4 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSL21) | See below(EPSSL21) | See below(EPSSL21) |
Notes:
|
Global complex symmetric matrix A of order 8 with block size 2 × 2:
B,D 0 1 2 3
* *
0 | ( 0.0, 1.0) (-1.0, 0.0) | (-1.0, 0.0) ( 0.0, 1.0) |( 0.0, 1.0) ( 0.0, 1.0) | ( 0.0, 1.0) ( 0.0, 1.0) |
| . ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) |( 0.0, 1.0) ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) |
| ------------------------|-------------------------|------------------------|------------------------ |
1 | . . | (-1.0, 0.0) (-1.0, 0.0) |( 0.0, 1.0) ( 0.0, 1.0) | ( 1.0, 2.0) ( 0.0, 1.0) |
| . . | . (-1.0, 0.0) |( 1.0, 2.0) ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) |
| ------------------------|-------------------------|------------------------|------------------------ |
2 | . . | . . |(-1.0, 0.0) ( 0.0, 1.0) | ( 0.0, 1.0) ( 0.0, 1.0) |
| . . | . . | . ( 1.0, 2.0) | ( 0.0, 1.0) ( 0.0, 1.0) |
| ------------------------|-------------------------|------------------------|------------------------ |
3 | . . | . . | . . | ( 0.0, 1.0) ( 0.0, 1.0) |
| . . | . . | . . | . ( 0.0, 1.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for A:
p,q | 0 1
-----|-------------------------------------------------|-------------------------------------------------
| ( 0.0, 1.0) (-1.0, 0.0) ( 0.0, 1.0) ( 0.0, 1.0) | (-1.0, 0.0) ( 0.0, 1.0) ( 0.0, 1.0) ( 0.0, 1.0)
| . ( 1.0, 2.0) ( 0.0, 1.0) ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) ( 0.0, 1.0) ( 1.0, 2.0)
0 | . . (-1.0, 0.0) ( 0.0, 1.0) | . . ( 0.0, 1.0) ( 0.0, 1.0)
| . . . ( 1.0, 2.0) | . . ( 0.0, 1.0) ( 0.0, 1.0)
-----|-------------------------------------------------|-------------------------------------------------
| . . ( 0.0, 1.0) ( 0.0, 1.0) | (-1.0, 0.0) (-1.0, 0.0) ( 1.0, 2.0) ( 0.0, 1.0)
| . . ( 1.0, 2.0) ( 1.0, 2.0) | . (-1.0, 0.0) ( 0.0, 1.0) ( 1.0, 2.0)
1 | . . . . | . . ( 0.0, 1.0) ( 0.0, 1.0)
| . . . . | . . . ( 0.0, 1.0)
Global general 16 × 8 matrix B with block size 4 × 2:
B,D 0 1 2 3
* *
| (-1.0,-3.0) ( 0.0,-2.0) | ( 1.0,-1.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) | (-1.0,-3.0) (-1.0,-3.0) |
| (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 1.0,-1.0) (-1.0,-3.0) | (-1.0,-3.0) ( 1.0,-1.0) |
0 | ( 1.0,-1.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 0.0,-2.0) |
| ( 0.0,-2.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 1.0,-1.0) ( 0.0,-2.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 0.0,-2.0) | (-1.0,-3.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
1 | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) (-1.0,-3.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) |
| (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 1.0,-1.0) |
2 | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 1.0,-1.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) |
3 | ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
| (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 1.0,-1.0) ( 0.0,-2.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for B:
p,q | 0 | 1
-----|-------------------------------------------------|-------------------------------------------------
| (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0) | ( 1.0,-1.0) (-1.0,-3.0) (-1.0,-3.0) (-1.0,-3.0)
| (-1.0,-3.0) (-1.0,-3.0) ( 1.0,-1.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) (-1.0,-3.0) ( 1.0,-1.0)
| ( 1.0,-1.0) ( 1.0,-1.0) (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0)
| ( 0.0,-2.0) (-1.0,-3.0) ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0)
0 | ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) (-1.0.-3.0) ( 0.0,-2.0) ( 1.0,-1.0)
| (-1.0,-3.0) (-1.0,-3.0) ( 0.0,-2.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0)
-----|-------------------------------------------------|-------------------------------------------------
| ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0) ( 0.0,-2.0)
| ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0)
| ( 1.0,-1.0) ( 1.0,-1.0) ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0)
1 | ( 1.0,-1.0) ( 1.0,-1.0) (-1.0,-3.0) (-1.0,-3.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0)
| ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0)
| (-1.0,-3.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0)
Output:
Global general 16 × 8 matrix C with block size 4 × 2:
B,D 0 1 2 3
* *
| (11.0,27.0) (37.0,39.0) | ( 7.0,29.0) (27.0,39.0) | (27.0,29.0) (37.0,39.0) | (21.0,37.0) (35.0,35.0) |
| ( 7.0,29.0) (37.0,39.0) | (11.0,27.0) (35.0,35.0) | (23.0,31.0) (45.0,35.0) | (21.0,37.0) (35.0,35.0) |
0 | ( 1.0,27.0) (31.0,37.0) | (-3.0,29.0) (21.0,37.0) | ( 9.0,33.0) (27.0,39.0) | (23.0,31.0) (21.0,37.0) |
| ( 6.0,32.0) (48.0,36.0) | (10.0,30.0) (42.0,34.0) | (22.0,34.0) (44.0,38.0) | (28.0,36.0) (38.0,36.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| (-4.0,22.0) (10.0,40.0) | (-8.0,24.0) (12.0,34.0) | ( 0.0,30.0) (10.0,40.0) | (10.0,30.0) ( 8.0,36.0) |
| (11.0,27.0) (41.0,37.0) | (11.0,27.0) (43.0,31.0) | (15.0,35.0) (41.0,37.0) | (21.0,37.0) (31.0,37.0) |
1 | (-1.0,23.0) (25.0,35.0) | (-1.0,23.0) (11.0,37.0) | (11.0,27.0) (17.0,39.0) | (13.0,31.0) (15.0,35.0) |
| (-3.0,29.0) (23.0,41.0) | (-3.0,29.0) (13.0,41.0) | (13.0,31.0) (27.0,39.0) | (23.0,31.0) (25.0,35.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| (-2.0,26.0) (24.0,38.0) | (-6.0,28.0) ( 6.0,42.0) | (18.0,26.0) (28.0,36.0) | (16.0,32.0) (26.0,32.0) |
| ( 7.0,29.0) (37.0,39.0) | ( 7.0,29.0) (39.0,33.0) | (19.0,33.0) (45.0,35.0) | (21.0,37.0) (35.0,35.0) |
2 | (-2.0,26.0) (24.0,38.0) | ( 2.0,24.0) (22.0,34.0) | (10.0,30.0) (24.0,38.0) | (16.0,32.0) (18.0,36.0) |
| ( 5.0,25.0) (31.0,37.0) | ( 9.0,23.0) (37.0,29.0) | ( 9.0,33.0) (31.0,37.0) | (15.0,35.0) (21.0,37.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 1.0,27.0) (31.0,37.0) | (-3.0,29.0) (21.0,37.0) | ( 9.0,33.0) (31.0,37.0) | (23.0,31.0) (21.0,37.0) |
| ( 4.0,28.0) (38.0,36.0) | ( 4.0,28.0) (28.0,36.0) | (20.0,30.0) (34.0,38.0) | (22.0,34.0) (28.0,36.0) |
3 | ( 5.0,25.0) (27.0,39.0) | ( 1.0,27.0) (21.0,37.0) | (13.0,31.0) (27.0,39.0) | (19.0,33.0) (21.0,37.0) |
| ( 0.0,30.0) (26.0,42.0) | (-8.0,34.0) (20.0,40.0) | (16.0,32.0) (30.0,40.0) | (26.0,32.0) (28.0,36.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for C:
p,q | 0 | 1
-----|-------------------------------------------------|-------------------------------------------------
| (11.0,27.0) (37.0,39.0) (27.0,29.0) (37.0,39.0) | ( 7.0,29.0) (27.0,39.0) (21.0,37.0) (35.0,35.0)
| ( 7.0,29.0) (37.0,39.0) (23.0,31.0) (45.0,35.0) | (11.0,27.0) (35.0,35.0) (21.0,37.0) (35.0,35.0)
| ( 1.0,27.0) (31.0,37.0) ( 9.0,33.0) (27.0,39.0) | (-3.0,29.0) (21.0,37.0) (23.0,31.0) (21.0,37.0)
| ( 6.0,32.0) (48.0,36.0) (22.0,34.0) (44.0,38.0) | (10.0,30.0) (42.0,34.0) (28.0,36.0) (38.0,36.0)
0 | (-2.0,26.0) (24.0,38.0) (18.0,26.0) (28.0,36.0) | (-6.0,28.0) ( 6.0,42.0) (16.0,32.0) (26.0,32.0)
| ( 7.0,29.0) (37.0,39.0) (19.0,33.0) (45.0,35.0) | ( 7.0,29.0) (39.0,33.0) (21.0,37.0) (35.0,35.0)
| (-2.0,26.0) (24.0,38.0) (10.0,30.0) (24.0,38.0) | ( 2.0,24.0) (22.0,34.0) (16.0,32.0) (18.0,36.0)
| ( 5.0,25.0) (31.0,37.0) ( 9.0,33.0) (31.0,37.0) | ( 9.0,23.0) (37.0,29.0) (15.0,35.0) (21.0,37.0)
-----|-------------------------------------------------|-------------------------------------------------
| (-4.0,22.0) (10.0,40.0) ( 0.0,30.0) (10.0,40.0) | (-8.0,24.0) (12.0,34.0) (10.0,30.0) ( 8.0,36.0)
| (11.0,27.0) (41.0,37.0) (15.0,35.0) (41.0,37.0) | (11.0,27.0) (43.0,31.0) (21.0,37.0) (31.0,37.0)
| (-1.0,23.0) (25.0,35.0) (11.0,27.0) (17.0,39.0) | (-1.0,23.0) (11.0,37.0) (13.0,31.0) (15.0,35.0)
| (-3.0,29.0) (23.0,41.0) (13.0,31.0) (27.0,39.0) | (-3.0,29.0) (13.0,41.0) (23.0,31.0) (25.0,35.0)
1 | ( 1.0,27.0) (31.0,37.0) ( 9.0,33.0) (31.0,37.0) | (-3.0,29.0) (21.0,37.0) (23.0,31.0) (21.0,37.0)
| ( 4.0,28.0) (38.0,36.0) (20.0,30.0) (34.0,38.0) | ( 4.0,28.0) (28.0,36.0) (22.0,34.0) (28.0,36.0)
| ( 5.0,25.0) (27.0,39.0) (13.0,31.0) (27.0,39.0) | ( 1.0,27.0) (21.0,37.0) (19.0,33.0) (21.0,37.0)
| ( 0.0,30.0) (26.0,42.0) (16.0,32.0) (30.0,40.0) | (-8.0,34.0) (20.0,40.0) (26.0,32.0) (28.0,36.0)
This example computes C = betaC+alphaBA using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) SIDE UPLO M N ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | CALL PZHEMM( 'R' , 'U' , 16 , 8 , ALPHA , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , BETA , C , 1 , 1 , DESC_C ) ALPHA = (1.0, 0.0) BETA = (0.0, 0.0)
| Desc_A | Desc_B | Desc_C |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBG22) | icontxt(IOBG22) | icontxt(IOBG22) |
M_ | 8 | 16 | 16 |
N_ | 8 | 8 | 8 |
MB_ | 2 | 4 | 4 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSL22) | See below(EPSSL22) | See below(EPSSL22) |
Notes:
|
Global Hermitian matrix A of order 8 with block size 2 × 2:
B,D 0 1 2 3
* *
0 | ( 0.0, 0.0) (-1.0, 0.0) | (-1.0, 0.0) ( 0.0, 1.0) |( 0.0, 1.0) ( 0.0, 1.0) | ( 0.0, 1.0) ( 0.0, 1.0) |
| . ( 1.0, 0.0) | ( 0.0, 1.0) ( 1.0, 2.0) |( 0.0, 1.0) ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) |
| ------------------------|-------------------------|------------------------|------------------------ |
1 | . . | (-1.0, 0.0) (-1.0, 0.0) |( 0.0, 1.0) ( 0.0, 1.0) | ( 1.0, 2.0) ( 0.0, 1.0) |
| . . | . (-1.0, 0.0) |( 1.0, 2.0) ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) |
| ------------------------|-------------------------|------------------------|------------------------ |
2 | . . | . . |(-1.0, 0.0) ( 0.0, 1.0) | ( 0.0, 1.0) ( 0.0, 1.0) |
| . . | . . | . ( 1.0, 0.0) | ( 0.0, 1.0) ( 0.0, 1.0) |
| ------------------------|-------------------------|------------------------|------------------------ |
3 | . . | . . | . . | ( 0.0, 0.0) ( 0.0, 1.0) |
| . . | . . | . . | . ( 0.0, 0.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for A:
p,q | 0 | 1
-----|-------------------------------------------------|-------------------------------------------------
| ( 0.0, . ) (-1.0, 0.0) ( 0.0, 1.0) ( 0.0, 1.0) | (-1.0, 0.0) ( 0.0, 1.0) ( 0.0, 1.0) ( 0.0, 1.0)
| . ( 1.0, . ) ( 0.0, 1.0) ( 1.0, 2.0) | ( 0.0, 1.0) ( 1.0, 2.0) ( 0.0, 1.0) ( 1.0, 2.0)
0 | . . (-1.0, . ) ( 0.0, 1.0) | . . ( 0.0, 1.0) ( 0.0, 1.0)
| . . . ( 1.0, . ) | . . ( 0.0, 1.0) ( 0.0, 1.0)
-----|-------------------------------------------------|-------------------------------------------------
| . . ( 0.0, 1.0) ( 0.0, 1.0) | (-1.0, . ) (-1.0, 0.0) ( 1.0, 2.0) ( 0.0, 1.0)
| . . ( 1.0, 2.0) ( 1.0, 2.0) | . (-1.0, . ) ( 0.0, 1.0) ( 1.0, 2.0)
1 | . . . . | . . ( 0.0, . ) ( 0.0, 1.0)
| . . . . | . . . ( 0.0, . )
Global general 16 × 8 matrix B with block size 4 × 2:
B,D 0 1 2 3
* *
| (-1.0,-3.0) ( 0.0,-2.0) | ( 1.0,-1.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) | (-1.0,-3.0) (-1.0,-3.0) |
| (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 1.0,-1.0) (-1.0,-3.0) | (-1.0,-3.0) ( 1.0,-1.0) |
0 | ( 1.0,-1.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 0.0,-2.0) |
| ( 0.0,-2.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 1.0,-1.0) ( 0.0,-2.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 0.0,-2.0) | (-1.0,-3.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
1 | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) (-1.0,-3.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) |
| (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 1.0,-1.0) |
2 | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) (-1.0,-3.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
| ------------------------|-------------------------|-------------------------|------------------------ |
| ( 1.0,-1.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) |
| ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) (-1.0,-3.0) |
3 | ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) |
| (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) | ( 1.0,-1.0) ( 0.0,-2.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11 3 | |
Local arrays for B:
p,q | 0 | 1
-----|-------------------------------------------------|-------------------------------------------------
| (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0) | ( 1.0,-1.0) (-1.0,-3.0) (-1.0,-3.0) (-1.0,-3.0)
| (-1.0,-3.0) (-1.0,-3.0) ( 1.0,-1.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) (-1.0,-3.0) ( 1.0,-1.0)
| ( 1.0,-1.0) ( 1.0,-1.0) (-1.0,-3.0) ( 0.0,-2.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0)
| ( 0.0,-2.0) (-1.0,-3.0) ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0)
0 | ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0)
| (-1.0,-3.0) (-1.0,-3.0) ( 0.0,-2.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0)
-----|-------------------------------------------------|-------------------------------------------------
| ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0) ( 0.0,-2.0)
| ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0) (-1.0,-3.0) | ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0)
| ( 1.0,-1.0) ( 1.0,-1.0) ( 1.0,-1.0) ( 1.0,-1.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0)
1 | ( 1.0,-1.0) ( 1.0,-1.0) (-1.0,-3.0) (-1.0,-3.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 1.0,-1.0)
| ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) (-1.0,-3.0)
| ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0) ( 0.0,-2.0) | ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 0.0,-2.0)
| (-1.0,-3.0) ( 0.0,-2.0) ( 0.0,-2.0) ( 1.0,-1.0) | (-1.0,-3.0) ( 0.0,-2.0) ( 1.0,-1.0) ( 0.0,-2.0)
Output:
Global general 16 × 8 matrix C with block size 4 × 2:
B,D 0 1 2 3
* *
| (-12.0,4.0) (-19.0,-5.0) | (-9.0, 5.0) (-5.0,-5.0) | ( 3.0,-3.0) ( 9.0,-5.0) | (10.0, 2.0) (18.0,-6.0) |
| (-10.0,4.0) (-17.0,-7.0) | (-9.0, 3.0) (-5.0,-9.0) | ( 3.0,-1.0) ( 9.0,-9.0) | (14.0,-2.0) (20.0,-8.0) |
0 | (-10.0,4.0) (-19.0,-5.0) | (-7.0, 5.0) (-11.0,1.0) | ( 5.0, 1.0) (11.0,-5.0) | (12.0,-4.0) (17.0,-1.0) |
| (-10.0,6.0) (-22.0,-6.0) | (-8.0, 4.0) (-10.0,-6.0)| ( 4.0, 0.0) (10.0,-8.0) | (12.0,-2.0) (19.0,-7.0) |
| -------------------------|-------------------------|-------------------------|------------------------ |
| (-8.0, 0.0) (-10.0,-8.0) | (-6.0, 2.0) (-6.0,-4.0) | ( 4.0, 2.0) (10.0, 0.0) | ( 9.0, 1.0) (14.0, 4.0) |
| (-13.0,5.0) (-21.0,-5.0) | (-11.0,5.0) (-15.0,-3.0)| ( 3.0, 3.0) ( 9.0,-7.0) | (13.0,-1.0) (19.0,-5.0) |
1 | (-10.0,2.0) (-17.0,-7.0) | (-9.0, 3.0) (-7.0,-1.0) | ( 1.0, 1.0) ( 7.0, 1.0) | ( 7.0, 3.0) (14.0, 2.0) |
| (-7.0, 3.0) (-13.0,-7.0) | (-5.0, 3.0) (-1.0,-5.0) | ( 7.0,-3.0) (13.0,-7.0) | (13.0,-5.0) (18.0,-4.0) |
| -------------------------|-------------------------|-------------------------|------------------------ |
| (-8.0, 2.0) (-14.0,-8.0) | (-4.0, 2.0) ( 2.0,-6.0) | ( 6.0,-6.0) (12.0,-8.0) | (12.0,-2.0) (17.0,-5.0) |
| (-10.0,4.0) (-17.0,-7.0) | (-7.0, 3.0) (-7.0,-9.0) | ( 5.0,-1.0) (11.0,-11.0)| (15.0,-3.0) (20.0,-8.0) |
2 | (-8.0, 2.0) (-14.0,-8.0) | (-8.0, 2.0) (-6.0,-6.0) | ( 2.0, 2.0) ( 8.0,-2.0) | (10.0, 0.0) (16.0, 0.0) |
| (-11.0,3.0) (-17.0,-7.0) | (-11.0,3.0) (-13.0,-5.0)| ( 1.0, 5.0) ( 7.0,-3.0) | (11.0, 1.0) (17.0,-1.0) |
| -------------------------|-------------------------|-------------------------|------------------------ |
| (-10.0,4.0) (-19.0,-5.0) | (-7.0, 5.0) (-11.0,1.0) | ( 5.0, 1.0) (11.0,-7.0) | (14.0,-6.0) (18.0,-2.0) |
| (-10.0,4.0) (-20.0,-6.0) | (-8.0, 4.0) (-8.0,-4.0) | ( 2.0, 0.0) ( 8.0,-4.0) | (10.0, 0.0) (17.0,-3.0) |
3 | (-11.0,3.0) (-17.0,-5.0) | (-9.0, 5.0) (-9.0,-1.0) | ( 3.0, 1.0) ( 9.0,-3.0) | (11.0,-1.0) (17.0,-1.0) |
| (-7.0, 3.0) (-14.0,-6.0) | (-2.0, 4.0) (-2.0,-6.0) | ( 8.0,-4.0) (14.0,-8.0) | (13.0,-5.0) (18.0,-4.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P113 | |
Local arrays for C:
p,q | 0 | 1 -----|---------------------------------------------------------|--------------------------------------------------------- | (-12.0, 4.0) (-19.0, -5.0) ( 3.0, -3.0) ( 9.0, -5.0) | ( -9.0, 5.0) ( -5.0, -5.0) ( 10.0, 2.0) ( 18.0, -6.0) | (-10.0, 4.0) (-17.0, -7.0) ( 3.0, -1.0) ( 9.0, -9.0) | ( -9.0, 3.0) ( -5.0, -9.0) ( 14.0, -2.0) ( 20.0, -8.0) | (-10.0, 4.0) (-19.0, -5.0) ( 5.0, 1.0) ( 11.0, -5.0) | ( -7.0, 5.0) (-11.0, 1.0) ( 12.0, -4.0) ( 17.0, -1.0) | (-10.0, 6.0) (-22.0, -6.0) ( 4.0, 0.0) ( 10.0, -8.0) | ( -8.0, 4.0) (-10.0, -6.0) ( 12.0, -2.0) ( 19.0, -7.0) 0 | ( -8.0, 2.0) (-14.0, -8.0) ( 6.0, -6.0) ( 12.0, -8.0) | ( -4.0, 2.0) ( 2.0, -6.0) ( 12.0, -2.0) ( 17.0, -5.0) | (-10.0, 4.0) (-17.0, -7.0) ( 5.0, -1.0) ( 11.0,-11.0) | ( -7.0, 3.0) ( -7.0, -9.0) ( 15.0, -3.0) ( 20.0, -8.0) | ( -8.0, 2.0) (-14.0, -8.0) ( 2.0, 2.0) ( 8.0, -2.0) | ( -8.0, 2.0) ( -6.0, -6.0) ( 10.0, 0.0) ( 16.0, 0.0) | (-11.0, 3.0) (-17.0, -7.0) ( 1.0, 5.0) ( 7.0, -3.0) | (-11.0, 3.0) (-13.0, -5.0) ( 11.0, 1.0) ( 17.0, -1.0) -----|---------------------------------------------------------|--------------------------------------------------------- | ( -8.0, 0.0) (-10.0, -8.0) ( 4.0, 2.0) ( 10.0, 0.0) | ( -6.0, 2.0) ( -6.0, -4.0) ( 9.0, 1.0) ( 14.0, 4.0) | (-13.0, 5.0) (-21.0, -5.0) ( 3.0, 3.0) ( 9.0, -7.0) | (-11.0, 5.0) (-15.0, -3.0) ( 13.0, -1.0) ( 19.0, -5.0) | (-10.0, 2.0) (-17.0, -7.0) ( 1.0, 1.0) ( 7.0, 1.0) | ( -9.0, 3.0) ( -7.0, -1.0) ( 7.0, 3.0) ( 14.0, 2.0) | ( -7.0, 3.0) (-13.0, -7.0) ( 7.0, -3.0) ( 13.0, -7.0) | ( -5.0, 3.0) ( -1.0, -5.0) ( 13.0, -5.0) ( 18.0, -4.0) 1 | (-10.0, 4.0) (-19.0, -5.0) ( 5.0, 1.0) ( 11.0, -7.0) | ( -7.0, 5.0) (-11.0, 1.0) ( 14.0, -6.0) ( 18.0, -2.0) | (-10.0, 4.0) (-20.0, -6.0) ( 2.0, 0.0) ( 8.0, -4.0) | ( -8.0, 4.0) ( -8.0, -4.0) ( 10.0, 0.0) ( 17.0, -3.0) | (-11.0, 3.0) (-17.0, -5.0) ( 3.0, 1.0) ( 9.0, -3.0) | ( -9.0, 5.0) ( -9.0, -1.0) ( 11.0, -1.0) ( 17.0, -1.0) | ( -7.0, 3.0) (-14.0, -6.0) ( 8.0, -4.0) ( 14.0, -8.0) | ( -2.0, 4.0) ( -2.0, -6.0) ( 13.0, -5.0) ( 18.0, -4.0)