PDGEMV computes one of the following matrix-vector products:
PZGEMV computes one of the following matrix-vector products:
where, in the formulas above:
In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:
| alpha, beta, A, x, y | Subprogram |
| Long-precision real | PDGEMV |
| Long-precision complex | PZGEMV |
| Fortran | CALL PDGEMV | PZGEMV (transa, m, n, alpha, a, ia, ja, desc_a, x, ix, jx, desc_x, incx, beta, y, iy, jy, desc_y, incy) |
| C and C++ | pdgemv | pzgemv (transa, m, n, alpha, a, ia, ja, desc_a, x, ix, jx, desc_x, incx, beta, y, iy, jy, desc_y, incy); |
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Scope: global
Specified as: a single character; transa = 'N', 'T', or 'C'.
If transa = 'N', it is the number of elements in vector y.
If transa = 'T' or 'C', it is the number of elements in vector x.
Scope: global
Specified as: a fullword integer; m >= 0.
If transa = 'N', it is the number of elements in vector x.
If transa = 'T' or 'C', it is the number of elements in vector y.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 37.
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.
| desc_a | Name | Description | Limits | Scope |
|---|---|---|---|---|
| 1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
| 2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
| 3 | M_A | Number of rows in the global matrix |
If m = 0 or n = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
| 4 | N_A | Number of columns in the global matrix |
If m = 0 or n = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
| 5 | MB_A | Row block size | MB_A >= 1 | Global |
| 6 | NB_A | Column block size | NB_A >= 1 | Global |
| 7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
| 8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
| 9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If transa = 'N', numx = n
If transa = 'T' or 'C', numx = m
the following must be true:
Scope: local
Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.
If incx = M_X, it indicates which row of global matrix X is used for vector x.
If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.
Scope: global
Specified as: a fullword integer; 1 <= ix <= M_X, and if incx = 1 and incx <> M_X, then:
If transa = 'N', then ix+n-1 <= M_X.
If transa = 'T' or 'C', then ix+m-1 <= M_X.
If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.
If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.
Scope: global
Specified as: a fullword integer; 1 <= jx <= N_X, and if incx = M_X, then:
If transa = 'N', then jx+n-1 <= N_X.
If transa = 'T' or 'C', then jx+m-1 <= N_X.
| desc_x | Name | Description | Limits | Scope |
|---|---|---|---|---|
| 1 | DTYPE_X | Descriptor type | DTYPE_X=1 | Global |
| 2 | CTXT_X | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
| 3 | M_X | Number of rows in the global matrix |
If transa = 'N' and n = 0: M_X >= 0 If transa = 'T' and m = 0: M_X >= 0 Otherwise: M_X >= 1 | Global |
| 4 | N_X | Number of columns in the global matrix |
If transa = 'N' and n = 0: N_X >= 0 If transa = 'T' and m = 0: N_X >= 0 Otherwise: N_X >= 1 | Global |
| 5 | MB_X | Row block size | MB_X >= 1 | Global |
| 6 | NB_X | Column block size | NB_X >= 1 | Global |
| 7 | RSRC_X | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_X < p | Global |
| 8 | CSRC_X | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_X < q | Global |
| 9 | LLD_X | The leading dimension of the local array | LLD_X >= max(1,LOCp(M_X)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a fullword integer; incx = 1 or incx = M_X, where:
If incx = M_X, then x is a row-distributed vector.
If incx = 1 and incx <> M_X, then x is a column-distributed vector.
Scope: global
Specified as: a number of the data type indicated in Table 37.
If transa = 'N', numy = m
If transa = 'T' or 'C', numy = n
the following must be true:
When beta is zero, y need not be set on input.
Scope: local
Specified as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of the global matrix Y are stored in desc_y.
If incy = M_Y, it indicates which row of global matrix Y is used for vector y.
If incy = 1 and incy <> M_Y, it is the row index of global matrix Y, identifying the first element of vector y.
Scope: global
Specified as: a fullword integer; 1 <= iy <= M_Y, and if incy = 1 and incy <> M_Y, then:
If transa = 'N', then iy+m-1 <= M_Y.
If transa = 'T' or 'C', then iy+n-1 <= M_Y.
If incy = M_Y, it is the column index of global matrix Y, identifying the first element of vector y.
If incy = 1 and incy <> M_Y, it indicates which column of global matrix Y is used for vector y.
Scope: global
Specified as: a fullword integer; 1 <= jy <= N_Y, and if incy = M_Y, then:
If transa = 'N', then jy+m-1 <= N_Y.
If transa = 'T' or 'C', then jy+n-1 <= N_Y.
| desc_y | Name | Description | Limits | Scope |
|---|---|---|---|---|
| 1 | DTYPE_Y | Descriptor type | DTYPE_Y=1 | Global |
| 2 | CTXT_Y | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
| 3 | M_Y | Number of rows in the global matrix |
If transa = 'N' and m = 0: M_Y >= 0 If transa = 'T' and n = 0: M_Y >= 0 Otherwise: M_Y >= 1 | Global |
| 4 | N_Y | Number of columns in the global matrix |
If transa = 'N' and m = 0: N_Y >= 0 If transa = 'T' and n = 0: N_Y >= 0 Otherwise: N_Y >= 1 | Global |
| 5 | MB_Y | Row block size | MB_Y >= 1 | Global |
| 6 | NB_Y | Column block size | NB_Y >= 1 | Global |
| 7 | RSRC_Y | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_Y < p | Global |
| 8 | CSRC_Y | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_Y < q | Global |
| 9 | LLD_Y | The leading dimension of the local array | LLD_Y >= max(1,LOCp(M_Y)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a fullword integer; incy = 1 or incy = M_Y, where:
If incy = M_Y, then y is a row-distributed vector.
If incy = 1 and incy <> M_Y, then y is a column-distributed vector.
Scope: local
Returned as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 37.
None
Unable to allocate work space
If (n = 0 and transa = 'N') or (m = 0 and transa = 'T' or 'C'):
Otherwise:
In all cases:
If (m = 0 and transa = 'N') or (n = 0 and transa = 'T' or 'C'):
Otherwise:
In all cases:
If m <> 0 and n <> 0:
If (n <> 0 and transa = 'N') or (m <> 0 and transa = 'T' or 'C'):
If (m <> 0 and transa = 'N') or (n <> 0 and transa = 'T' or 'C'):
If incx = M_X and transa = 'N':
If incx = M_X and transa = 'T' or 'C':
If incx = 1( <> M_X) and transa = 'N':
If incx = 1( <> M_X) and transa = 'T' or 'C':
In all cases:
If incy = M_Y and transa = 'N':
If incy = M_Y and transa = 'T' or 'C':
If incy = 1( <> M_Y) and transa = 'N':
If incy = 1( <> M_Y) and transa = 'T' or 'C':
In all cases:
If transa = 'N':
If transa = 'T' or 'C':
In all cases:
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 1 for PDGEMM. The updated portion of Y is the same as for C in PDGEMM, as this computation is equivalent to a portion of the PDGEMM computation.
This example uses a global submatrix A within a global matrix A by specifying ia = 3 and ja = 1. It uses vectors x and y, which are column-distributed vectors within a column of X and Y, respectively, by specifying incx = 1, ix = 1, and jx = 2 for x and incy = 1, iy = 3, and jy = 2 for y.
ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
TRANSA M N ALPHA A IA JA DESC_A X IX JX
| | | | | | | | | | |
CALL PDGEMV( 'N' , 4 , 5 , 1.0D0 , A , 3 , 1 , DESC_A , X , 1 , 2 ,
DESC_X INCX BETA Y IY JY DESC_Y INCY
| | | | | | | |
DESC_X , 1 , 2.0D0 , Y , 3 , 2 , DESC_Y , 1 )
|
| Desc_A | Desc_X | Desc_Y |
|---|---|---|---|
| DTYPE_ | 1 | 1 | 1 |
| CTXT_ | icontxt(IOBGC) | icontxt(IOBGC) | icontxt(IOBGC) |
| M_ | 6 | 5 | 6 |
| N_ | 5 | 4 | 4 |
| MB_ | 3 | 2 | 3 |
| NB_ | 2 | 2 | 2 |
| RSRC_ | 0 | 0 | 0 |
| CSRC_ | 0 | 0 | 0 |
| LLD_ | See below(EPSSLAF) | See below(EPSSLAF) | See below(EPSSLAF) |
|
Notes:
| |||
After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 4 × 5 submatrix A, starting at row 3 and column 1 in global general 6 × 5 matrix A with block size 3 × 2:
B,D 0 1 2
* *
| . . | . . | . |
0 | . . | . . | . |
| 1.0 -1.0 | -1.0 1.0 | 2.0 |
| -----------|-------------|------ |
| -3.0 2.0 | 2.0 2.0 | 0.0 |
1 | 4.0 0.0 | -2.0 1.0 | -1.0 |
| -1.0 -1.0 | 1.0 -3.0 | 2.0 |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1
-----|-----------------|------------
| . . . | . .
0 | . . . | . .
| 1.0 -1.0 2.0 | -1.0 1.0
-----|-----------------|------------
| -3.0 2.0 0.0 | 2.0 2.0
1 | 4.0 0.0 -1.0 | -2.0 1.0
| -1.0 -1.0 2.0 | 1.0 -3.0
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 5 × 1, starting at row 1 and column 2 in 5 × 4 global matrix X with block size 2 × 2:
B,D 0 1
* *
0 | . -1.0 | . . |
| . 2.0 | . . |
| -----------|---------- |
1 | . 0.0 | . . |
| . -1.0 | . . |
| -----------|---------- |
2 | . 2.0 | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0 | 1
-----|------------|-----------
| . -1.0 | . .
0 | . 2.0 | . .
| . 2.0 | . .
-----|------------|-----------
1 | . 0.0 | . .
| . -1.0 | . .
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 3 and column 2 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1
* *
| . . | . . |
0 | . . | . . |
| . 0.5 | . . |
| -----------|---------- |
| . 0.5 | . . |
1 | . 0.5 | . . |
| . 0.5 | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1
-----|------------|-----------
| . . | . .
0 | . . | . .
| . 0.5 | . .
-----|------------|-----------
| . 0.5 | . .
1 | . 0.5 | . .
| . 0.5 | . .
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 3 and column 2 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1
* *
| . . | . . |
0 | . . | . . |
| . 1.0 | . . |
| -----------|---------- |
| . 6.0 | . . |
1 | . -6.0 | . . |
| . 7.0 | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1
-----|------------|-----------
| . . | . .
0 | . . | . .
| . 1.0 | . .
-----|------------|-----------
| . 6.0 | . .
1 | . -6.0 | . .
| . 7.0 | . .
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 1 for PDGEMM.
This example uses a global submatrix A within a global matrix A by specifying ia = 2 and ja = 2. It uses vector x, which is a row-distributed vector within a row of X, by specifying incx = M_X = 5, ix = 4, and jx = 2. It uses vector y, which is a column-distributed vector within a column of Y, by specifying incy = 1, iy = 2, and jy = 3.
ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
TRANSA M N ALPHA A IA JA DESC_A X IX JX
| | | | | | | | | | |
CALL PDGEMV( 'N' , 4 , 3 , 1.0D0 , A , 2 , 2 , DESC_A , X , 4 , 2 ,
DESC_X INCX BETA Y IY JY DESC_Y INCY
| | | | | | | |
DESC_X , 5 , 2.0D0 , Y , 2 , 3 , DESC_Y , 1 )
|
| Desc_A | Desc_X | Desc_Y |
|---|---|---|---|
| DTYPE_ | 1 | 1 | 1 |
| CTXT_ | icontxt(IOBGC2) | icontxt(IOBGC2) | icontxt(IOBGC2) |
| M_ | 6 | 5 | 6 |
| N_ | 5 | 4 | 4 |
| MB_ | 3 | 2 | 3 |
| NB_ | 2 | 2 | 2 |
| RSRC_ | 0 | 0 | 0 |
| CSRC_ | 0 | 0 | 0 |
| LLD_ | See below(EPSSLA2) | See below(EPSSLA2) | See below(EPSSLA2) |
|
Notes:
| |||
After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 4 × 3 submatrix A, starting at row 2 and column 2 in global general 6 × 5 matrix A with block size 3 × 2:
B,D 0 1 2
* *
| . . | . . | . |
0 | . 0.0 | 1.0 1.0 | . |
| . -1.0 | -1.0 1.0 | . |
| -----------|-------------|----- |
| . 2.0 | 2.0 2.0 | . |
1 | . 0.0 | -2.0 1.0 | . |
| . . | . . | . |
* *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1
-----|----------------|------------
| . . . | . .
0 | . 0.0 . | 1.0 1.0
| . -1.0 . | -1.0 1.0
-----|----------------|------------
| . 2.0 . | 2.0 2.0
1 | . 0.0 . | -2.0 1.0
| . . . | . .
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 3, starting at row 4 and column 2 in 5 × 4 global matrix X with block size 2 × 2:
B,D 0 1
* *
0 | . . | . . |
| . . | . . |
| -----------|----------- |
1 | . . | . . |
| . -1.0 | 1.0 -1.0 |
| -----------|----------- |
2 | . . | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0 | 1
-----|------------|------------
| . . | . .
0 | . . | . .
| . . | . .
-----|------------|------------
1 | . . | . .
| . -1.0 | 1.0 -1.0
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 2 and column 3 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1
* *
| . . | . . |
0 | . . | 0.5 . |
| . . | 0.5 . |
| ----------|---------- |
| . . | 0.5 . |
1 | . . | 0.5 . |
| . . | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1
-----|-----------|-----------
| . . | . .
0 | . . | 0.5 .
| . . | 0.5 .
-----|-----------|-----------
| . . | 0.5 .
1 | . . | 0.5 .
| . . | . .
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 2 and column 3 in 6 × 4 global matrix Y with block size 3 × 2:
B,D 0 1
* *
| . . | . . |
0 | . . | 1.0 . |
| . . | 0.0 . |
| ----------|---------- |
| . . | -1.0 . |
1 | . . | -2.0 . |
| . . | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0 | 1
-----|-----------|-----------
| . . | . .
0 | . . | 1.0 .
| . . | 0.0 .
-----|-----------|-----------
| . . | -1.0 .
1 | . . | -2.0 .
| . . | . .
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 2 for PZGEMM. The updated portion of Y is the same as for C in PZGEMM, as this computation is equivalent to a portion of the PZGEMM computation.
This example uses vectors x and y, which are column-distributed vectors within a column of X and Y, respectively, by specifying incx = 1, ix = 1, and jx = 2 for x and incy = 1, iy = 1, and jy = 2 for y.
ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
TRANSA M N ALPHA A IA JA DESC_A X IX JX
| | | | | | | | | | |
CALL PZGEMV( 'N' , 6 , 3 , ALPHA , A , 1 , 1 , DESC_A , X , 1 , 2 ,
DESC_X INCX BETA Y IY JY DESC_Y INCY
| | | | | | | |
DESC_X , 1 , BETA , Y , 1 , 2 , DESC_Y , 1 )
ALPHA = (1.0,0.0)
BETA = (2.0,0.0)
|
| Desc_A | Desc_X | Desc_Y |
|---|---|---|---|
| DTYPE_ | 1 | 1 | 1 |
| CTXT_ | icontxt(IOBGC3) | icontxt(IOBGC3) | icontxt(IOBGC3) |
| M_ | 6 | 3 | 6 |
| N_ | 3 | 2 | 2 |
| MB_ | 2 | 2 | 2 |
| NB_ | 2 | 2 | 2 |
| RSRC_ | 0 | 0 | 0 |
| CSRC_ | 0 | 0 | 0 |
| LLD_ | See below(EPSSLA3) | See below(EPSSLA3) | See below(EPSSLA3) |
|
Notes:
| |||
Global general 6 × 3 matrix A with block size 2 × 2:
B,D 0 1
* *
0 | (1.0,5.0) (9.0,2.0) | (1.0,9.0) |
| (2.0,4.0) (8.0,3.0) | (1.0,8.0) |
| -----------------------|------------ |
1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) |
| (4.0,2.0) (4.0,7.0) | (1.0,5.0) |
| -----------------------|------------ |
2 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) |
| (6.0,6.0) (3.0,6.0) | (1.0,4.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1
-----|------------------------|-------------
| (1.0,5.0) (9.0,2.0) | (1.0,9.0)
| (2.0,4.0) (8.0,3.0) | (1.0,8.0)
0 | (5.0,1.0) (5.0,1.0) | (1.0,6.0)
| (6.0,6.0) (3.0,6.0) | (1.0,4.0)
-----|------------------------|-------------
1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0)
| (4.0,2.0) (4.0,7.0) | (1.0,5.0)
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 3 × 1, starting at row 1 and column 2 in 3 × 2 global matrix X with block size 2 × 2:
B,D 0
* *
0 | . (2.0,7.0) |
| . (6.0,8.0) |
| --------------------- |
1 | . (4.0,5.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0
-----|-----------------------
0 | . (2.0,7.0)
| . (6.0,8.0)
-----|-----------------------
1 | . (4.0,5.0)
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 2 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0
* *
0 | . (0.5,0.0) |
| . (0.5,0.0) |
| --------------------- |
1 | . (0.5,0.0) |
| . (0.5,0.0) |
| --------------------- |
2 | . (0.5,0.0) |
| . (0.5,0.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0
-----|-----------------------
| . (0.5,0.0)
| . (0.5,0.0)
0 | . (0.5,0.0)
| . (0.5,0.0)
-----|-----------------------
1 | . (0.5,0.0)
| . (0.5,0.0)
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 2 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0
* *
0 | . (-35.0.142.0) |
| . (-35.0.141.0) |
| ----------------------------- |
1 | . (-43.0.146.0) |
| . (-58.0.131.0) |
| ----------------------------- |
2 | . (0.0.112.0) |
| . (-75.0.135.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0
-----|-------------------------------
| . (-35.0,142.0)
| . (-35.0,141.0)
0 | . ( 0.0,112.0)
| . (-75.0,135.0)
-----|-------------------------------
1 | . (-43.0,146.0)
| . (-58.0,131.0)
This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in Example 2 for PZGEMM.
This example uses vector x, which is a row-distributed vector within a row of X, by specifying incx = M_X = 3, ix = 1, and jx = 1. It uses vector y, which is a column-distributed vector within a column of Y, by specifying incy = 1, iy = 1, and jy = 1.
ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
TRANSA M N ALPHA A IA JA DESC_A X IX JX
| | | | | | | | | | |
CALL PZGEMV( 'N' , 6 , 2 , ALPHA , A , 1 , 1 , DESC_A , X , 1 , 1 ,
DESC_X INCX BETA Y IY JY DESC_Y INCY
| | | | | | | |
DESC_X , 3 , BETA , Y , 1 , 1 , DESC_Y , 1 )
ALPHA = (1.0,0.0)
BETA = (2.0,0.0)
|
| Desc_A | Desc_X | Desc_Y |
|---|---|---|---|
| DTYPE_ | 1 | 1 | 1 |
| CTXT_ | icontxt(IOBGC4) | icontxt(IOBGC4) | icontxt(IOBGC4) |
| M_ | 6 | 3 | 6 |
| N_ | 3 | 2 | 2 |
| MB_ | 2 | 2 | 2 |
| NB_ | 2 | 2 | 2 |
| RSRC_ | 0 | 0 | 0 |
| CSRC_ | 0 | 0 | 0 |
| LLD_ | See below(EPSSLA4) | See below(EPSSLA4) | See below(EPSSLA4) |
|
Notes:
| |||
Global general 6 × 3 matrix A with block size 2 × 2:
B,D 0 1
* *
0 | (1.0,5.0) (9.0,2.0) | (1.0,9.0) |
| (2.0,4.0) (8.0,3.0) | (1.0,8.0) |
| -----------------------|------------ |
1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) |
| (4.0,2.0) (4.0,7.0) | (1.0,5.0) |
| -----------------------|------------ |
2 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) |
| (6.0,6.0) (3.0,6.0) | (1.0,4.0) |
* *
The following is the 2 × 2 process grid:
B,D | 0 | 1 -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for A:
p,q | 0 | 1
-----|------------------------|-------------
| (1.0,5.0) (9.0,2.0) | (1.0,9.0)
| (2.0,4.0) (8.0,3.0) | (1.0,8.0)
0 | (5.0,1.0) (5.0,1.0) | (1.0,6.0)
| (6.0,6.0) (3.0,6.0) | (1.0,4.0)
-----|------------------------|-------------
1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0)
| (4.0,2.0) (4.0,7.0) | (1.0,5.0)
After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 2, starting at row 1 and column 1 in 3 × 2 global matrix X with block size 2 × 2:
B,D 0
* *
0 | (1.0,8.0) (2.0,7.0) |
| . . |
| --------------------- |
1 | . . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 -----| ------- |----- 1 | P10 | P11
Local arrays for x:
p,q | 0
-----|-----------------------
0 | (1.0,8.0) (2.0,7.0)
| . .
-----|-----------------------
1 | . .
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 1 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0
* *
0 | (0.5,0.0) . |
| (0.5,0.0) . |
| --------------------- |
1 | (0.5,0.0) . |
| (0.5,0.0) . |
| --------------------- |
2 | (0.5,0.0) . |
| (0.5,0.0) . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0
-----|-----------------------
| (0.5,0.0) .
| (0.5,0.0) .
0 | (0.5,0.0) .
| (0.5,0.0) .
-----|-----------------------
1 | (0.5,0.0) .
| (0.5,0.0) .
Output:
After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 6 × 1, starting at row 1 and column 1 in 6 × 2 global matrix Y with block size 2 × 2:
B,D 0
* *
0 | (-34.0, 80.0) . |
| (-34.0, 82.0) . |
| ----------------------------- |
1 | (-41.0, 86.0) . |
| (-52.0, 76.0) . |
| ----------------------------- |
2 | ( 1.0, 78.0) . |
| (-77.0, 87.0) . |
* *
The following is the 2 × 2 process grid:
B,D | 0 | -- -----| ------- |----- 0 | P00 | P01 2 | | -----| ------- |----- 1 | P10 | P11
Local arrays for y:
p,q | 0
-----|-------------------------------
| (-34.0, 80.0) .
| (-34.0, 82.0) .
0 | ( 1.0, 78.0) .
| (-77.0, 87.0) .
-----|-------------------------------
1 | (-41.0, 86.0) .
| (-52.0, 76.0) .