PDGEMV computes one of the following matrix-vector products:
PZGEMV computes one of the following matrix-vector products:
where, in the formulas above:
In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:
alpha, beta, A, x, y | Subprogram |
Long-precision real | PDGEMV |
Long-precision complex | PZGEMV |
Fortran | CALL PDGEMV | PZGEMV (transa, m, n, alpha, a, ia, ja, desc_a, x, ix, jx, desc_x, incx, beta, y, iy, jy, desc_y, incy) |
C and C++ | pdgemv | pzgemv (transa, m, n, alpha, a, ia, ja, desc_a, x, ix, jx, desc_x, incx, beta, y, iy, jy, desc_y, incy); |
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Scope: global
Specified as: a single character; transa = 'N', 'T', or 'C'.
If transa = 'N', it is the number of elements in vector y.
If transa = 'T' or 'C', it is the number of elements in vector x.
Scope: global
Specified as: a fullword integer; m >= 0.
If transa = 'N', it is the number of elements in vector x.
If transa = 'T' or 'C', it is the number of elements in vector y.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 37.
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If m = 0 or n = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If m = 0 or n = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If transa = 'N', numx = n
If transa = 'T' or 'C', numx = m
the following must be true:
Scope: local
Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.
If incx = M_X, it indicates which row of global matrix X is used for vector x.
If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.
Scope: global
Specified as: a fullword integer; 1 <= ix <= M_X, and if incx = 1 and incx <> M_X, then:
If transa = 'N', then ix+n-1 <= M_X.
If transa = 'T' or 'C', then ix+m-1 <= M_X.
If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.
If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.
Scope: global
Specified as: a fullword integer; 1 <= jx <= N_X, and if incx = M_X, then:
If transa = 'N', then jx+n-1 <= N_X.
If transa = 'T' or 'C', then jx+m-1 <= N_X.
desc_x | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_X | Descriptor type | DTYPE_X=1 | Global |
2 | CTXT_X | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_X | Number of rows in the global matrix |
If transa = 'N' and n = 0: M_X >= 0 If transa = 'T' and m = 0: M_X >= 0 Otherwise: M_X >= 1 | Global |
4 | N_X | Number of columns in the global matrix |
If transa = 'N' and n = 0: N_X >= 0 If transa = 'T' and m = 0: N_X >= 0 Otherwise: N_X >= 1 | Global |
5 | MB_X | Row block size | MB_X >= 1 | Global |
6 | NB_X | Column block size | NB_X >= 1 | Global |
7 | RSRC_X | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_X < p | Global |
8 | CSRC_X | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_X < q | Global |
9 | LLD_X | The leading dimension of the local array | LLD_X >= max(1,LOCp(M_X)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a fullword integer; incx = 1 or incx = M_X, where:
If incx = M_X, then x is a row-distributed vector.
If incx = 1 and incx <> M_X, then x is a column-distributed vector.
Scope: global
Specified as: a number of the data type indicated in Table 37.
If transa = 'N', numy = m
If transa = 'T' or 'C', numy = n
the following must be true:
When beta is zero, y need not be set on input.
Scope: local
Specified as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of the global matrix Y are stored in desc_y.
If incy = M_Y, it indicates which row of global matrix Y is used for vector y.
If incy = 1 and incy <> M_Y, it is the row index of global matrix Y, identifying the first element of vector y.
Scope: global
Specified as: a fullword integer; 1 <= iy <= M_Y, and if incy = 1 and incy <> M_Y, then:
If transa = 'N', then iy+m-1 <= M_Y.
If transa = 'T' or 'C', then iy+n-1 <= M_Y.
If incy = M_Y, it is the column index of global matrix Y, identifying the first element of vector y.
If incy = 1 and incy <> M_Y, it indicates which column of global matrix Y is used for vector y.
Scope: global
Specified as: a fullword integer; 1 <= jy <= N_Y, and if incy = M_Y, then:
If transa = 'N', then jy+m-1 <= N_Y.
If transa = 'T' or 'C', then jy+n-1 <= N_Y.
desc_y | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_Y | Descriptor type | DTYPE_Y=1 | Global |
2 | CTXT_Y | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_Y | Number of rows in the global matrix |
If transa = 'N' and m = 0: M_Y >= 0 If transa = 'T' and n = 0: M_Y >= 0 Otherwise: M_Y >= 1 | Global |
4 | N_Y | Number of columns in the global matrix |
If transa = 'N' and m = 0: N_Y >= 0 If transa = 'T' and n = 0: N_Y >= 0 Otherwise: N_Y >= 1 | Global |
5 | MB_Y | Row block size | MB_Y >= 1 | Global |
6 | NB_Y | Column block size | NB_Y >= 1 | Global |
7 | RSRC_Y | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_Y < p | Global |
8 | CSRC_Y | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_Y < q | Global |
9 | LLD_Y | The leading dimension of the local array | LLD_Y >= max(1,LOCp(M_Y)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a fullword integer; incy = 1 or incy = M_Y, where:
If incy = M_Y, then y is a row-distributed vector.
If incy = 1 and incy <> M_Y, then y is a column-distributed vector.
Scope: local
Returned as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 37.
None
Unable to allocate work space
If (n = 0 and transa = 'N') or (m = 0 and transa = 'T' or 'C'):
Otherwise:
In all cases:
If (m = 0 and transa = 'N') or (n = 0 and transa = 'T' or 'C'):
Otherwise:
In all cases:
If m <> 0 and n <> 0:
If (n <> 0 and transa = 'N') or (m <> 0 and transa = 'T' or 'C'):
If (m <> 0 and transa = 'N') or (n <> 0 and transa = 'T' or 'C'):
If incx = M_X and transa = 'N':
If incx = M_X and transa = 'T' or 'C':
If incx = 1( <> M_X) and transa = 'N':
If incx = 1( <> M_X) and transa = 'T' or 'C':
In all cases:
If incy = M_Y and transa = 'N':
If incy = M_Y and transa = 'T' or 'C':
If incy = 1( <> M_Y) and transa = 'N':
If incy = 1( <> M_Y) and transa = 'T' or 'C':
In all cases:
If transa = 'N':
If transa = 'T' or 'C':
In all cases:
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 1 for PDGEMM. The updated portion of Y is the same as for C in PDGEMM, as this computation is equivalent to a portion of the PDGEMM computation.
This example uses a global submatrix A within a global matrix A by specifying ia = 3 and ja = 1. It uses vectors x and y, which are column-distributed vectors within a column of X and Y, respectively, by specifying incx = 1, ix = 1, and jx = 2 for x and incy = 1, iy = 3, and jy = 2 for y.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) TRANSA M N ALPHA A IA JA DESC_A X IX JX | | | | | | | | | | | CALL PDGEMV( 'N' , 4 , 5 , 1.0D0 , A , 3 , 1 , DESC_A , X , 1 , 2 , DESC_X INCX BETA Y IY JY DESC_Y INCY | | | | | | | | DESC_X , 1 , 2.0D0 , Y , 3 , 2 , DESC_Y , 1 )
| Desc_A | Desc_X | Desc_Y |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBGC) | icontxt(IOBGC) | icontxt(IOBGC) |
M_ | 6 | 5 | 6 |
N_ | 5 | 4 | 4 |
MB_ | 3 | 2 | 3 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSLAF) | See below(EPSSLAF) | See below(EPSSLAF) |
Notes:
|
After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 4 × 5 submatrix A, starting at row 3 and column 1 in global general 6 × 5 matrix A with block size 3 × 2:
B,D 0 1 2 * * | . . | . . | . | 0 | . . | . . | . | | 1.0 -1.0 | -1.0 1.0 | 2.0 | | -----------|-------------|------ | | -3.0 2.0 | 2.0 2.0 | 0.0 | 1 | 4.0 0.0 | -2.0 1.0 | -1.0 | | -1.0 -1.0 | 1.0 -3.0 | 2.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1 -----|-----------------|------------ | . . . | . . 0 | . . . | . . | 1.0 -1.0 2.0 | -1.0 1.0 -----|-----------------|------------ | -3.0 2.0 0.0 | 2.0 2.0 1 | 4.0 0.0 -1.0 | -2.0 1.0 | -1.0 -1.0 2.0 | 1.0 -3.0
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 5 × 1, starting at row 1 and column 2 in 5 × 4 global matrix X with block size 2 × 2:
B,D 0 1 * * 0 | . -1.0 | . . | | . 2.0 | . . | | -----------|---------- | 1 | . 0.0 | . . | | . -1.0 | . . | | -----------|---------- | 2 | . 2.0 | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0 | 1 -----|------------|----------- | . -1.0 | . . 0 | . 2.0 | . . | . 2.0 | . . -----|------------|----------- 1 | . 0.0 | . . | . -1.0 | . .
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 3 and column 2 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1 * * | . . | . . | 0 | . . | . . | | . 0.5 | . . | | -----------|---------- | | . 0.5 | . . | 1 | . 0.5 | . . | | . 0.5 | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1 -----|------------|----------- | . . | . . 0 | . . | . . | . 0.5 | . . -----|------------|----------- | . 0.5 | . . 1 | . 0.5 | . . | . 0.5 | . .
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 3 and column 2 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1 * * | . . | . . | 0 | . . | . . | | . 1.0 | . . | | -----------|---------- | | . 6.0 | . . | 1 | . -6.0 | . . | | . 7.0 | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1 -----|------------|----------- | . . | . . 0 | . . | . . | . 1.0 | . . -----|------------|----------- | . 6.0 | . . 1 | . -6.0 | . . | . 7.0 | . .
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 1 for PDGEMM.
This example uses a global submatrix A within a global matrix A by specifying ia = 2 and ja = 2. It uses vector x, which is a row-distributed vector within a row of X, by specifying incx = M_X = 5, ix = 4, and jx = 2. It uses vector y, which is a column-distributed vector within a column of Y, by specifying incy = 1, iy = 2, and jy = 3.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) TRANSA M N ALPHA A IA JA DESC_A X IX JX | | | | | | | | | | | CALL PDGEMV( 'N' , 4 , 3 , 1.0D0 , A , 2 , 2 , DESC_A , X , 4 , 2 , DESC_X INCX BETA Y IY JY DESC_Y INCY | | | | | | | | DESC_X , 5 , 2.0D0 , Y , 2 , 3 , DESC_Y , 1 )
| Desc_A | Desc_X | Desc_Y |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBGC2) | icontxt(IOBGC2) | icontxt(IOBGC2) |
M_ | 6 | 5 | 6 |
N_ | 5 | 4 | 4 |
MB_ | 3 | 2 | 3 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSLA2) | See below(EPSSLA2) | See below(EPSSLA2) |
Notes:
|
After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 4 × 3 submatrix A, starting at row 2 and column 2 in global general 6 × 5 matrix A with block size 3 × 2:
B,D 0 1 2 * * | . . | . . | . | 0 | . 0.0 | 1.0 1.0 | . | | . -1.0 | -1.0 1.0 | . | | -----------|-------------|----- | | . 2.0 | 2.0 2.0 | . | 1 | . 0.0 | -2.0 1.0 | . | | . . | . . | . | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1 -----|----------------|------------ | . . . | . . 0 | . 0.0 . | 1.0 1.0 | . -1.0 . | -1.0 1.0 -----|----------------|------------ | . 2.0 . | 2.0 2.0 1 | . 0.0 . | -2.0 1.0 | . . . | . .
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 3, starting at row 4 and column 2 in 5 × 4 global matrix X with block size 2 × 2:
B,D 0 1 * * 0 | . . | . . | | . . | . . | | -----------|----------- | 1 | . . | . . | | . -1.0 | 1.0 -1.0 | | -----------|----------- | 2 | . . | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0 | 1 -----|------------|------------ | . . | . . 0 | . . | . . | . . | . . -----|------------|------------ 1 | . . | . . | . -1.0 | 1.0 -1.0
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 2 and column 3 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1 * * | . . | . . | 0 | . . | 0.5 . | | . . | 0.5 . | | ----------|---------- | | . . | 0.5 . | 1 | . . | 0.5 . | | . . | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1 -----|-----------|----------- | . . | . . 0 | . . | 0.5 . | . . | 0.5 . -----|-----------|----------- | . . | 0.5 . 1 | . . | 0.5 . | . . | . .
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 2 and column 3 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1 * * | . . | . . | 0 | . . | 1.0 . | | . . | 0.0 . | | ----------|---------- | | . . | -1.0 . | 1 | . . | -2.0 . | | . . | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1 -----|-----------|----------- | . . | . . 0 | . . | 1.0 . | . . | 0.0 . -----|-----------|----------- | . . | -1.0 . 1 | . . | -2.0 . | . . | . .
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 2 for PZGEMM. The updated portion of Y is the same as for C in PZGEMM, as this computation is equivalent to a portion of the PZGEMM computation.
This example uses vectors x and y, which are column-distributed vectors within a column of X and Y, respectively, by specifying incx = 1, ix = 1, and jx = 2 for x and incy = 1, iy = 1, and jy = 2 for y.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) TRANSA M N ALPHA A IA JA DESC_A X IX JX | | | | | | | | | | | CALL PZGEMV( 'N' , 6 , 3 , ALPHA , A , 1 , 1 , DESC_A , X , 1 , 2 , DESC_X INCX BETA Y IY JY DESC_Y INCY | | | | | | | | DESC_X , 1 , BETA , Y , 1 , 2 , DESC_Y , 1 ) ALPHA = (1.0,0.0) BETA = (2.0,0.0)
| Desc_A | Desc_X | Desc_Y |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBGC3) | icontxt(IOBGC3) | icontxt(IOBGC3) |
M_ | 6 | 3 | 6 |
N_ | 3 | 2 | 2 |
MB_ | 2 | 2 | 2 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSLA3) | See below(EPSSLA3) | See below(EPSSLA3) |
Notes:
|
Global general 6 × 3 matrix A with block size 2 × 2:
B,D 0 1 * * 0 | (1.0,5.0) (9.0,2.0) | (1.0,9.0) | | (2.0,4.0) (8.0,3.0) | (1.0,8.0) | | -----------------------|------------ | 1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) | | (4.0,2.0) (4.0,7.0) | (1.0,5.0) | | -----------------------|------------ | 2 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) | | (6.0,6.0) (3.0,6.0) | (1.0,4.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1 -----|------------------------|------------- | (1.0,5.0) (9.0,2.0) | (1.0,9.0) | (2.0,4.0) (8.0,3.0) | (1.0,8.0) 0 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) | (6.0,6.0) (3.0,6.0) | (1.0,4.0) -----|------------------------|------------- 1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) | (4.0,2.0) (4.0,7.0) | (1.0,5.0)
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 3 × 1, starting at row 1 and column 2 in 3 × 2 global matrix X with block size 2 × 2:
B,D 0 * * 0 | . (2.0,7.0) | | . (6.0,8.0) | | --------------------- | 1 | . (4.0,5.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0 -----|----------------------- 0 | . (2.0,7.0) | . (6.0,8.0) -----|----------------------- 1 | . (4.0,5.0)
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 2 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0 * * 0 | . (0.5,0.0) | | . (0.5,0.0) | | --------------------- | 1 | . (0.5,0.0) | | . (0.5,0.0) | | --------------------- | 2 | . (0.5,0.0) | | . (0.5,0.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 -----|----------------------- | . (0.5,0.0) | . (0.5,0.0) 0 | . (0.5,0.0) | . (0.5,0.0) -----|----------------------- 1 | . (0.5,0.0) | . (0.5,0.0)
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 2 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0 * * 0 | . (-35.0.142.0) | | . (-35.0.141.0) | | ----------------------------- | 1 | . (-43.0.146.0) | | . (-58.0.131.0) | | ----------------------------- | 2 | . (0.0.112.0) | | . (-75.0.135.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 -----|------------------------------- | . (-35.0,142.0) | . (-35.0,141.0) 0 | . ( 0.0,112.0) | . (-75.0,135.0) -----|------------------------------- 1 | . (-43.0,146.0) | . (-58.0,131.0)
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 2 for PZGEMM.
This example uses vector x, which is a row-distributed vector within a row of X, by specifying incx = M_X = 3, ix = 1, and jx = 1. It uses vector y, which is a column-distributed vector within a column of Y, by specifying incy = 1, iy = 1, and jy = 1.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) TRANSA M N ALPHA A IA JA DESC_A X IX JX | | | | | | | | | | | CALL PZGEMV( 'N' , 6 , 2 , ALPHA , A , 1 , 1 , DESC_A , X , 1 , 1 , DESC_X INCX BETA Y IY JY DESC_Y INCY | | | | | | | | DESC_X , 3 , BETA , Y , 1 , 1 , DESC_Y , 1 ) ALPHA = (1.0,0.0) BETA = (2.0,0.0)
| Desc_A | Desc_X | Desc_Y |
---|---|---|---|
DTYPE_ | 1 | 1 | 1 |
CTXT_ | icontxt(IOBGC4) | icontxt(IOBGC4) | icontxt(IOBGC4) |
M_ | 6 | 3 | 6 |
N_ | 3 | 2 | 2 |
MB_ | 2 | 2 | 2 |
NB_ | 2 | 2 | 2 |
RSRC_ | 0 | 0 | 0 |
CSRC_ | 0 | 0 | 0 |
LLD_ | See below(EPSSLA4) | See below(EPSSLA4) | See below(EPSSLA4) |
Notes:
|
Global general 6 × 3 matrix A with block size 2 × 2:
B,D 0 1 * * 0 | (1.0,5.0) (9.0,2.0) | (1.0,9.0) | | (2.0,4.0) (8.0,3.0) | (1.0,8.0) | | -----------------------|------------ | 1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) | | (4.0,2.0) (4.0,7.0) | (1.0,5.0) | | -----------------------|------------ | 2 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) | | (6.0,6.0) (3.0,6.0) | (1.0,4.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1 -----|------------------------|------------- | (1.0,5.0) (9.0,2.0) | (1.0,9.0) | (2.0,4.0) (8.0,3.0) | (1.0,8.0) 0 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) | (6.0,6.0) (3.0,6.0) | (1.0,4.0) -----|------------------------|------------- 1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) | (4.0,2.0) (4.0,7.0) | (1.0,5.0)
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 2, starting at row 1 and column 1 in 3 × 2 global matrix X with block size 2 × 2:
B,D 0 * * 0 | (1.0,8.0) (2.0,7.0) | | . . | | --------------------- | 1 | . . | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0 -----|----------------------- 0 | (1.0,8.0) (2.0,7.0) | . . -----|----------------------- 1 | . .
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 1 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0 * * 0 | (0.5,0.0) . | | (0.5,0.0) . | | --------------------- | 1 | (0.5,0.0) . | | (0.5,0.0) . | | --------------------- | 2 | (0.5,0.0) . | | (0.5,0.0) . | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 -----|----------------------- | (0.5,0.0) . | (0.5,0.0) . 0 | (0.5,0.0) . | (0.5,0.0) . -----|----------------------- 1 | (0.5,0.0) . | (0.5,0.0) .
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 1 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0 * * 0 | (-34.0, 80.0) . | | (-34.0, 82.0) . | | ----------------------------- | 1 | (-41.0, 86.0) . | | (-52.0, 76.0) . | | ----------------------------- | 2 | ( 1.0, 78.0) . | | (-77.0, 87.0) . | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 -----|------------------------------- | (-34.0, 80.0) . | (-34.0, 82.0) . 0 | ( 1.0, 78.0) . | (-77.0, 87.0) . -----|------------------------------- 1 | (-41.0, 86.0) . | (-52.0, 76.0) .