This subroutine determines an optimal stride value for you to use for your input or output data when you are computing large row Fourier transforms in any of the Fourier transform subroutines, except _RCFT and _CRFT. The strides determined by this subroutine allow your arrays to fit comfortably in various levels of storage hierarchy on your particular processor, thus allowing you to improve your run-time performance.
Fortran | CALL STRIDE (n, incd, incr, dt, iopt) |
C and C++ | stride (n, incd, incr, dt, iopt); |
PL/I | CALL STRIDE (n, incd, incr, dt, iopt); |
Specified as: a fullword integer; n > 0.
Specified as: a fullword integer; incd > 0 or incd < 0.
If dt = 'S', the numbers are short-precision real.
If dt = 'D', the numbers are long-precision real.
If dt = 'C', the numbers are short-precision complex.
If dt = 'Z', the numbers are long-precision complex.
Specified as: a single character; dt = 'S', 'D', 'C', or 'Z'.
Returned as: a fullword integer; incr > 0 or incr < 0 and |incr| >= |incd|, where incr has the same sign (+ or -) as incd.
This subroutine determines an optimal stride, incr, for you to use for your input or output data when computing large row Fourier transforms. The stride value returned by this subroutine is based on the size and structure of your transform data, using:
This information is used in determining the optimal stride for the processor you are currently running on. The stride determined by this subroutine allows your arrays to fit comfortably in various levels of storage hierarchy for that processor, thus giving you the ability to improve your run-time performance.
You get only one stride value returned by this subroutine on each invocation. Therefore, in many instances, you may need to invoke this subroutine multiple times to obtain several stride values to use in your Fourier transform computation:
Where multiple invocations are necessary, they are explained in the examples starting on page "Example 1--SCFT". The examples also explain how to calculate the incd values for each invocation. There are nine examples to cover the Fourier transform subroutines that can use the STRIDE subroutine.
After calling this subroutine and obtaining the optimal stride value, you then set up your input or output array accordingly. This may involve movement of data for input arrays or increasing the sizes of input or output arrays. To accomplish this, you may want to set up a separate subroutine with the stride values passed into it as arguments. You can then dimension your arrays in that subroutine, depending on the values calculated by STRIDE. For additional information on how to set up your data, see Setting Up Your Data.
None
This example shows the use of the STRIDE subroutine in computing one-dimensional row transforms using the SCFT subroutine.
If inc2x = 1, the input sequences are stored in the transposed form as rows of a two-dimensional array X(INC1X,N). In this case, the STRIDE subroutine helps in determining a good value of inc1x for this array. The required minimum value of inc1x is m, the number of Fourier transforms being computed. To find a good value of inc1x, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N , M , INC1X , 'C' , 0 )
Here, the arguments refer to the SCFT subroutine. In the following table, values of inc1x are given (as obtained from the STRIDE subroutine) for some combinations of n and m and for POWER3 with 64KB level 1 |cache:
| N M INC1X | | 128 64 64 | 240 32 32 | 240 64 65 | 256 256 264 | 512 60 60 | 1024 64 65
The above example also applies when the output sequences are stored in the transposed form (inc2y = 1). In that case, in the above example, inc1x is replaced by inc1y.
In computing column transforms (inc1x = inc1y = 1), the values of inc2x and inc2y are not very important. For these, any value over the required minimum of n can be used.
This example shows the use of the STRIDE subroutine in computing one-dimensional row transforms using the DCOSF subroutine.
If inc2x = 1, the input sequences are stored in the transposed form as rows of a two-dimensional array X(INC1X,N/2+1). In this case, the STRIDE subroutine helps in determining a good value of inc1x for this array. The required minimum value of inc1x is m, the number of Fourier transforms being computed. To find a good value of inc1x, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N/2+1 , M , INC1X , 'D' , 0 )
Here, the arguments refer to the DCOSF subroutine. In the following table, values of inc1x are given (as obtained from the STRIDE subroutine) for some combinations of n and m and for POWER3 with 64KB level 1 |cache:
| N M INC1X | | 128 64 64 | 240 32 32 | 240 64 64 | 256 256 264 | 512 60 60 | 1024 64 65
The above example also applies when the output sequences are stored in the transposed form (inc2y = 1). In that case, in the above example, inc1x is replaced by inc1y.
In computing column transforms (inc1x = inc1y = 1), the values of inc2x and inc2y are not very important. For these, any value over the required minimum of n/2+1 can be used.
This example shows the use of the STRIDE subroutine in computing one-dimensional row transforms using the DSINF subroutine.
If inc2x = 1, the input sequences are stored in the transposed form as rows of a two-dimensional array X(INC1X,N/2). In this case, the STRIDE subroutine helps in determining a good value of inc1x for this array. The required minimum value of inc1x is m, the number of Fourier transforms being computed. To find a good value of inc1x, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N/2 , M , INC1X , 'D' , 0 )
Here, the arguments refer to the DSINF subroutine. In the following table, values of inc1x are given (as obtained from the STRIDE subroutine) for some combinations of n and m and for POWER3 with 64KB level 1 |cache:
| N M INC1X | | 128 64 64 | 240 32 32 | 240 64 64 | 256 256 264 | 512 60 60 | 1024 64 65
The above example also applies when the output sequences are stored in the transposed form (inc2y = 1). In that case, in the above example, inc1x is replaced by inc1y.
In computing column transforms (inc1x = inc1y = 1), the values of inc2x and inc2y are not very important. For these, any value over the required minimum of n/2 can be used.
This example shows the use of the STRIDE subroutine in computing two-dimensional transforms using the SCFT2 subroutine.
If inc1y = 1, the two-dimensional output array is stored in the normal form. In this case, the output array can be declared as Y(INC2Y,N2), where the required minimum value of inc2y is n1. The STRIDE subroutine helps in picking a good value of inc2y. To find a good value of inc2y, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N2 , N1 , INC2Y , 'C' , 0 )
Here, the arguments refer to the SCFT2 subroutine. In the following table, values of inc2y are given (as obtained from the STRIDE subroutine) for some two-dimensional arrays with n1 = n2 and for POWER3 with 64KB level 1 |cache:
| N1 N2 INC2Y | | 64 64 64 | 128 128 136 | 240 240 240 | 512 512 520 | 840 840 848
If the input array is stored in the normal form (inc1x = 1), the value of inc2x is not important. However, if you want to use the same array for input and output, you should use inc2x = inc2y.
If inc2y = 1, the two-dimensional output array is stored in the transposed form. In this case, the output array can be declared as Y(INC1Y,N1), where the required minimum value of inc1y is n2. The STRIDE subroutine helps in picking a good value of inc1y. To find a good value of inc1y, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N1 , N2 , INC1Y , 'C' , 0 )
Here, the arguments refer to the SCFT2 subroutine. In the following table, values of inc1y are given (as obtained from the STRIDE subroutine) for some combinations of n1 and n2 and for POWER3 with 64K level 1 |cache:
| N1 N2 INC1Y | | 60 64 64 | 120 128 136 | 256 240 240 | 512 512 520 | 840 840 848
If the input array is stored in the transposed form (inc2x = 1), the value of inc1x is also important. The above example can be used to find a good value of inc1x, by replacing inc1y with inc1x. If both arrays are stored in the transposed form, a good value for inc1y is also a good value for inc1x. In that situation, the two arrays can also be made equivalent.
This example shows the use of the STRIDE subroutine in computing two-dimensional transforms using the SRCFT2 subroutine.
For this subroutine, the output array is declared as Y(INC2Y,N2), where the required minimum value of inc2y is n1/2+1. The STRIDE subroutine helps in picking a good value of inc2y. To find a good value of inc2y, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N2 , N1/2 + 1 , INC2Y , 'C' , 0 )
Here, the arguments refer to the SRCFT2 subroutine. In the following table, values of inc2y are given (as obtained from the STRIDE subroutine) for some two-dimensional arrays with n1 = n2 and for POWER3 with 64KB level 1 |cache:
| N1 N2 INC2Y | | 240 240 121 | 420 420 211 | 512 512 257 | 840 840 421 | 1024 1024 513 | 2048 2048 1032
For this subroutine, the leading dimension of the input array (inc2x) is not important. If you want to use the same array for input and output, you should use inc2x >= 2(inc2y).
This example shows the use of the STRIDE subroutine in computing two-dimensional transforms using the SCRFT2 subroutine.
For this subroutine, the output array is declared as Y(INC2Y,N2), where the required minimum value of inc2y is n1+2. The STRIDE subroutine helps in picking a good value of inc2y. To find a good value of inc2y, use STRIDE as follows:
N INCD INCR DT IOPT | | | | | CALL STRIDE( N2 , N1 + 2 , INC2Y , 'S' , 0 )
Here, the arguments refer to the SCRFT2 subroutine. In the following table, values of inc2y are given (as obtained from the STRIDE subroutine) for some two-dimensional arrays with n1 = n2 and for POWER3 with 64KB level 1 |cache:
| N1 N2 INC2Y | | 240 240 242 | 420 420 422 | 512 512 514 | 840 840 842 | 1024 1024 1026 | 2048 2048 2064
For this subroutine, the leading dimension of the input array (inc2x) is also important. In general, inc2x = inc2y/2 is a good choice. This is also the requirement if you want to use the same array for input and output.
This example shows the use of the STRIDE subroutine in computing three-dimensional transforms using the SCFT3 subroutine.
For this subroutine, the strides for the input array are not important. They are important for the output array. The STRIDE subroutine helps in picking good values of inc2y and inc3y. This requires two calls to the STRIDE subroutine as shown below. First, you should find a good value for inc2y. The minimum acceptable value for inc2y is n1.
N INCD INCR DT IOPT | | | | | CALL STRIDE( N2 , N1 , INC2Y , 'C' , 0 )
Here, the arguments refer to the SCFT3 subroutine. Next, you should find a good value for inc3y. The minimum acceptable value for inc3y is (n2)(inc2y).
N INCD INCR DT IOPT | | | | | CALL STRIDE( N3 , N2*INC2Y, INC3Y , 'C' , 0 )
If inc3y turns out to be a multiple of inc2y, then Y can be declared a three-dimensional array as Y(INC2Y,INC3Y/INC2Y,N3). For large problems, this may not happen. In that case, you can declare the Y array as a two-dimensional array Y(0:INC3Y-1,0:N3-1) or a one-dimensional array Y(0:INC3Y*N3-1). Using zero-based indexing, the element y(k1,k2,k3) is stored in the following location in these arrays:
In the following table, values of inc2y and inc3y are given (as obtained from the STRIDE subroutine) for some three-dimensional arrays with n1 = n2 = n3 and for POWER3 with 64KB level 1 |cache:
| N1,N2,N3 INC2Y INC3Y | | 30 30 900 | 32 32 1032 | 64 64 4112 | 120 120 14408 | 128 136 17416 | 240 240 57608 | 256 264 67592 | 420 420 176400
As mentioned before, the strides of the input array are not important. The array can be declared as a three-dimensional array. If you want to use the same array for input and output, the requirements are inc2x >= inc2y and inc3x >= inc3y. A simple thing to do is to use inc2x = inc2y and make inc3x a multiple of inc2x not smaller than inc3y. Then X can be declared as a three-dimensional array X(INC2X,INC3X/INC2X,N3).
This example shows the use of the STRIDE subroutine in computing three-dimensional transforms using the SRCFT3 subroutine.
For this subroutine, the strides for the input array are not important. They are important for the output array. The STRIDE subroutine helps in picking good values of inc2y and inc3y. This requires two calls to the STRIDE subroutine as shown below. First, you should find a good value for inc2y. The minimum acceptable value for inc2y is n1/2+1.
N INCD INCR DT IOPT | | | | | CALL STRIDE( N2 , N1/2 + 1 , INC2Y , 'C' , 0 )
Here, the arguments refer to the SRCFT3 subroutine. Next, you should find a good value for inc3y. The minimum acceptable value for inc3y is (n2)(inc2y).
N INCD INCR DT IOPT | | | | | CALL STRIDE( N3 , N2*INC2Y , INC3Y , 'C' , 0 )
If inc3y turns out to be a multiple of inc2y, then Y can be declared a three-dimensional array as Y(INC2Y,INC3Y/INC2Y,N3). For large problems, this may not happen. In that case, you can declare the Y array as a two-dimensional array Y(0:INC3Y-1,0:N3-1) or a one-dimensional array Y(0:INC3Y*N3-1). Using zero-based indexing, the element y(k1,k2,k3) is stored in the following location in these arrays:
In the following table, values of inc2y and inc3y are given (as obtained from the STRIDE subroutine) for some three-dimensional arrays with n1 = n2 = n3 and for POWER3 with 64KB level 1 |cache:
| N1,N2,N3 INC2Y INC3Y | | 30 16 488 | 32 17 552 | 64 33 2128 | 120 61 7320 | 128 65 8328 | 240 121 29064 | 256 129 33032 | 420 211 88620
As mentioned before, the strides of the input array are not important. The array can be declared as a three-dimensional array. If you want to use the same array for input and output, the requirements are inc2x >= 2(inc2y) and inc3x >= 2(inc3y). A simple thing to do is to use inc2x = 2(inc2y) and make inc3x a multiple of inc2x not smaller than 2(inc3y). Then X can be declared as a three-dimensional array X(INC2X,INC3X/INC2X,N3).
This example shows the use of the STRIDE subroutine in computing three-dimensional transforms using the SCRFT3 subroutine.
The STRIDE subroutine helps in picking good values of inc2y and inc3y. This requires two calls to the STRIDE subroutine as shown below. First, you should find a good value for inc2y. The minimum acceptable value for inc2y is n1+2.
N INCD INCR DT IOPT | | | | | CALL STRIDE( N2 , N1 + 2 , INC2Y , 'S' , 0 )
Here, the arguments refer to the SCRFT3 subroutine. Next, you should find a good value for inc3y. The minimum acceptable value for inc3y is (n2)(inc2y).
N INCD INCR DT IOPT | | | | | CALL STRIDE( N3 , N2*INC2Y , INC3Y , 'S' , 0 )
If inc3y turns out to be a multiple of inc2y, then Y can be declared a three-dimensional array as Y(INC2Y,INC3Y/INC2Y,N3). For large problems, this may not happen. In that case, you can declare the Y array as a two-dimensional array Y(0:INC3Y-1,0:N3-1) or a one-dimensional array Y(0:INC3Y*N3-1). Using zero-based indexing, the element y(k1,k2,k3) is stored in the following location in these arrays:
In the following table, values of inc2y and inc3y are given (as obtained from the STRIDE subroutine) for some three-dimensional arrays with n1 = n2 = n3 and for POWER3 with 64KB level 1 |cache:
| N1,N2,N3 INC2Y INC3Y | | 30 32 976 | 32 34 1104 | 64 66 4256 | 120 122 14640 | 128 130 16656 | 240 242 58128 | 256 258 66064 | 420 422 177240
For this subroutine, the strides (inc2x and inc3x) of the input array are also important. In general, inc2x = inc2y/2 and inc3x = inc3y/2 are good choices. These are also the requirement if you want to use the same array for input and output.