Engineering and Scientific Subroutine Library for AIX Version 3 Release 3: Guide and Reference

STRIDE--Determine the Stride Value for Optimal Performance in Specified Fourier Transform Subroutines

This subroutine determines an optimal stride value for you to use for your input or output data when you are computing large row Fourier transforms in any of the Fourier transform subroutines, except _RCFT and _CRFT. The strides determined by this subroutine allow your arrays to fit comfortably in various levels of storage hierarchy on your particular processor, thus allowing you to improve your run-time performance.

Note:: This subroutine returns a single stride value. Where you need multiple strides, you must invoke this subroutine multiple times; for example, in the multidimensional Fourier transforms and, also, when input and output data types differ. For more details, see Function.

Syntax

Fortran	CALL STRIDE (`n`, `incd`, `incr`, `dt`, `iopt`)
C and C++	stride (`n`, `incd`, `incr`, `dt`, `iopt`);
PL/I	CALL STRIDE (`n`, `incd`, `incr`, `dt`, `iopt`);

On Entry

n

is the length n of the Fourier transform for which the optimal stride is being determined. The transform corresponding to n is usually a row transform; that is, the data elements are stored using a stride value.

Specified as: a fullword integer; n > 0.

incd

is the minimum allowable stride for the Fourier transform for which the optimal stride is being determined. For each situation in each subroutine, there is a specific way to compute this minimum value. This is explained in the examples starting on page "Example 1--SCFT".

Specified as: a fullword integer; incd > 0 or incd < 0.

incr

See On Return.

dt

is the data type of the numbers for the Fourier transform for which the optimal stride is being determined, where:

If dt = 'S', the numbers are short-precision real.

If dt = 'D', the numbers are long-precision real.

If dt = 'C', the numbers are short-precision complex.

If dt = 'Z', the numbers are long-precision complex.

Specified as: a single character; dt = 'S', 'D', 'C', or 'Z'.

iopt

is provided only for migration purposes from ESSL Version 1 and is no longer used; however, you must still specify it as a dummy argument. Specified as: a fullword integer; iopt = 0, 1, or 2.

On Return

incr: is the stride that allows you to improve your run-time performance in your Fourier transform computation on your particular processor. In general, this value differs for each processor you are running on.
Returned as: a fullword integer; incr > 0 or incr < 0 and |incr| >= |incd|, where incr has the same sign (+ or -) as incd.

Notes

In your C program, argument incr must be passed by reference.
All subroutines accept lowercase letters for the dt argument.
For each situation in each of the Fourier transform subroutines, there is a specific way to compute the value you should specify for the incd argument. Details on how to compute each of these values is given in the examples starting on page "Example 1--SCFT". See the example corresponding to the Fourier transform subroutine you are using.
Where different data types are specified for the input and output data in your Fourier transform subroutine, you should be careful to indicate the correct data type in the dt argument in this subroutine.
For additional information on how to set up your data, see Setting Up Your Data.

Function

This subroutine determines an optimal stride, incr, for you to use for your input or output data when computing large row Fourier transforms. The stride value returned by this subroutine is based on the size and structure of your transform data, using:

The size of each data item (dt)
The minimum allowable stride for this transform (incd)
The length of the transform (n)

This information is used in determining the optimal stride for the processor you are currently running on. The stride determined by this subroutine allows your arrays to fit comfortably in various levels of storage hierarchy for that processor, thus giving you the ability to improve your run-time performance.

You get only one stride value returned by this subroutine on each invocation. Therefore, in many instances, you may need to invoke this subroutine multiple times to obtain several stride values to use in your Fourier transform computation:

For multidimensional Fourier transforms using several strides, this subroutine must be called once for each optimal stride you want to obtain. Successive invocations should go from the lower (earlier) dimensions to the higher (later) dimensions, because the results from the lower dimensions are used to calculate the incd values for the higher dimensions.
Where input and output data have different data types and you want to obtain optimal strides for each, this subroutine must be called once for each data type.

Where multiple invocations are necessary, they are explained in the examples starting on page "Example 1--SCFT". The examples also explain how to calculate the incd values for each invocation. There are nine examples to cover the Fourier transform subroutines that can use the STRIDE subroutine.

After calling this subroutine and obtaining the optimal stride value, you then set up your input or output array accordingly. This may involve movement of data for input arrays or increasing the sizes of input or output arrays. To accomplish this, you may want to set up a separate subroutine with the stride values passed into it as arguments. You can then dimension your arrays in that subroutine, depending on the values calculated by STRIDE. For additional information on how to set up your data, see Setting Up Your Data.

Error Conditions

Computational Errors

None

Input-Argument Errors

n <= 0
incd = 0
iopt <> 0, 1, or 2
dt <> S, D, C, or Z

Example 1--SCFT

This example shows the use of the STRIDE subroutine in computing one-dimensional row transforms using the SCFT subroutine.

If inc2x = 1, the input sequences are stored in the transposed form as rows of a two-dimensional array X(INC1X,N). In this case, the STRIDE subroutine helps in determining a good value of inc1x for this array. The required minimum value of inc1x is m, the number of Fourier transforms being computed. To find a good value of inc1x, use STRIDE as follows:

             N  INCD  INCR    DT   IOPT
             |   |     |       |    |
CALL STRIDE( N , M  , INC1X , 'C' , 0  )

Here, the arguments refer to the SCFT subroutine. In the following table, values of inc1x are given (as obtained from the STRIDE subroutine) for some combinations of n and m and for POWER3 with 64KB level 1 |cache:

|     N      M    INC1X
| 
|    128    64       64
|    240    32       32
|    240    64       65
|    256   256      264
|    512    60       60
|   1024    64       65

The above example also applies when the output sequences are stored in the transposed form (inc2y = 1). In that case, in the above example, inc1x is replaced by inc1y.

In computing column transforms (inc1x = inc1y = 1), the values of inc2x and inc2y are not very important. For these, any value over the required minimum of n can be used.

Example 2--DCOSF

This example shows the use of the STRIDE subroutine in computing one-dimensional row transforms using the DCOSF subroutine.

If inc2x = 1, the input sequences are stored in the transposed form as rows of a two-dimensional array X(INC1X,N/2+1). In this case, the STRIDE subroutine helps in determining a good value of inc1x for this array. The required minimum value of inc1x is m, the number of Fourier transforms being computed. To find a good value of inc1x, use STRIDE as follows:

              N     INCD  INCR    DT  IOPT
              |      |      |      |    |
CALL STRIDE( N/2+1 , M  , INC1X , 'D' , 0  )

Here, the arguments refer to the DCOSF subroutine. In the following table, values of inc1x are given (as obtained from the STRIDE subroutine) for some combinations of n and m and for POWER3 with 64KB level 1 |cache:

|     N      M    INC1X
| 
|    128    64       64
|    240    32       32
|    240    64       64
|    256   256      264
|    512    60       60
|   1024    64       65

The above example also applies when the output sequences are stored in the transposed form (inc2y = 1). In that case, in the above example, inc1x is replaced by inc1y.

In computing column transforms (inc1x = inc1y = 1), the values of inc2x and inc2y are not very important. For these, any value over the required minimum of n/2+1 can be used.

Example 3--DSINF

This example shows the use of the STRIDE subroutine in computing one-dimensional row transforms using the DSINF subroutine.

If inc2x = 1, the input sequences are stored in the transposed form as rows of a two-dimensional array X(INC1X,N/2). In this case, the STRIDE subroutine helps in determining a good value of inc1x for this array. The required minimum value of inc1x is m, the number of Fourier transforms being computed. To find a good value of inc1x, use STRIDE as follows:

              N   INCD  INCR    DT   IOPT
              |    |     |       |    |
CALL STRIDE( N/2 , M  , INC1X , 'D' , 0  )

Here, the arguments refer to the DSINF subroutine. In the following table, values of inc1x are given (as obtained from the STRIDE subroutine) for some combinations of n and m and for POWER3 with 64KB level 1 |cache:

|     N      M    INC1X
| 
|    128    64       64
|    240    32       32
|    240    64       64
|    256   256      264
|    512    60       60
|   1024    64       65

The above example also applies when the output sequences are stored in the transposed form (inc2y = 1). In that case, in the above example, inc1x is replaced by inc1y.

In computing column transforms (inc1x = inc1y = 1), the values of inc2x and inc2y are not very important. For these, any value over the required minimum of n/2 can be used.

Example 4--SCFT2

This example shows the use of the STRIDE subroutine in computing two-dimensional transforms using the SCFT2 subroutine.

If inc1y = 1, the two-dimensional output array is stored in the normal form. In this case, the output array can be declared as Y(INC2Y,N2), where the required minimum value of inc2y is n1. The STRIDE subroutine helps in picking a good value of inc2y. To find a good value of inc2y, use STRIDE as follows:

             N   INCD  INCR    DT   IOPT
             |    |     |       |    |
CALL STRIDE( N2 , N1 , INC2Y , 'C' , 0  )

Here, the arguments refer to the SCFT2 subroutine. In the following table, values of inc2y are given (as obtained from the STRIDE subroutine) for some two-dimensional arrays with n1 = n2 and for POWER3 with 64KB level 1 |cache:

|     N1      N2   INC2Y
| 
|     64      64      64
|    128     128     136
|    240     240     240
|    512     512     520
|    840     840     848

If the input array is stored in the normal form (inc1x = 1), the value of inc2x is not important. However, if you want to use the same array for input and output, you should use inc2x = inc2y.

If inc2y = 1, the two-dimensional output array is stored in the transposed form. In this case, the output array can be declared as Y(INC1Y,N1), where the required minimum value of inc1y is n2. The STRIDE subroutine helps in picking a good value of inc1y. To find a good value of inc1y, use STRIDE as follows:

             N   INCD  INCR    DT   IOPT
             |    |     |       |    |
CALL STRIDE( N1 , N2 , INC1Y , 'C' , 0  )

Here, the arguments refer to the SCFT2 subroutine. In the following table, values of inc1y are given (as obtained from the STRIDE subroutine) for some combinations of n1 and n2 and for POWER3 with 64K level 1 |cache:

|     N1     N2   INC1Y
| 
|     60     64      64
|    120    128     136
|    256    240     240
|    512    512     520
|    840    840     848

If the input array is stored in the transposed form (inc2x = 1), the value of inc1x is also important. The above example can be used to find a good value of inc1x, by replacing inc1y with inc1x. If both arrays are stored in the transposed form, a good value for inc1y is also a good value for inc1x. In that situation, the two arrays can also be made equivalent.

Example 5--SRCFT2

This example shows the use of the STRIDE subroutine in computing two-dimensional transforms using the SRCFT2 subroutine.

For this subroutine, the output array is declared as Y(INC2Y,N2), where the required minimum value of inc2y is n1/2+1. The STRIDE subroutine helps in picking a good value of inc2y. To find a good value of inc2y, use STRIDE as follows:

             N      INCD     INCR    DT   IOPT
             |        |        |      |    |
CALL STRIDE( N2 , N1/2 + 1 , INC2Y , 'C' , 0  )

Here, the arguments refer to the SRCFT2 subroutine. In the following table, values of inc2y are given (as obtained from the STRIDE subroutine) for some two-dimensional arrays with n1 = n2 and for POWER3 with 64KB level 1 |cache:

|     N1      N2    INC2Y
| 
|    240     240      121
|    420     420      211
|    512     512      257
|    840     840      421
|   1024    1024      513
|   2048    2048     1032

For this subroutine, the leading dimension of the input array (inc2x) is not important. If you want to use the same array for input and output, you should use inc2x >= 2(inc2y).

Example 6--SCRFT2

This example shows the use of the STRIDE subroutine in computing two-dimensional transforms using the SCRFT2 subroutine.

For this subroutine, the output array is declared as Y(INC2Y,N2), where the required minimum value of inc2y is n1+2. The STRIDE subroutine helps in picking a good value of inc2y. To find a good value of inc2y, use STRIDE as follows:

             N     INCD    INCR    DT   IOPT
             |       |       |      |    |
CALL STRIDE( N2 , N1 + 2 , INC2Y , 'S' , 0  )

Here, the arguments refer to the SCRFT2 subroutine. In the following table, values of inc2y are given (as obtained from the STRIDE subroutine) for some two-dimensional arrays with n1 = n2 and for POWER3 with 64KB level 1 |cache:

|     N1      N2    INC2Y
| 
|    240     240      242
|    420     420      422
|    512     512      514
|    840     840      842
|   1024    1024     1026
|   2048    2048     2064

For this subroutine, the leading dimension of the input array (inc2x) is also important. In general, inc2x = inc2y/2 is a good choice. This is also the requirement if you want to use the same array for input and output.

Example 7--SCFT3

This example shows the use of the STRIDE subroutine in computing three-dimensional transforms using the SCFT3 subroutine.

For this subroutine, the strides for the input array are not important. They are important for the output array. The STRIDE subroutine helps in picking good values of inc2y and inc3y. This requires two calls to the STRIDE subroutine as shown below. First, you should find a good value for inc2y. The minimum acceptable value for inc2y is n1.

             N   INCD  INCR    DT   IOPT
             |    |      |      |    |
CALL STRIDE( N2 , N1 , INC2Y , 'C' , 0  )

Here, the arguments refer to the SCFT3 subroutine. Next, you should find a good value for inc3y. The minimum acceptable value for inc3y is (n2)(inc2y).

             N      INCD    INCR    DT   IOPT
             |       |        |      |    |
CALL STRIDE( N3 , N2*INC2Y, INC3Y , 'C' , 0  )

If inc3y turns out to be a multiple of inc2y, then Y can be declared a three-dimensional array as Y(INC2Y,INC3Y/INC2Y,N3). For large problems, this may not happen. In that case, you can declare the Y array as a two-dimensional array Y(0:INC3Y-1,0:N3-1) or a one-dimensional array Y(0:INC3Y*N3-1). Using zero-based indexing, the element y(k1,k2,k3) is stored in the following location in these arrays:

For the two-dimensional array, location (k1+k2*inc2y,k3)
For the one-dimensional array, location (k1+k2*inc2y+k3*inc3y)

In the following table, values of inc2y and inc3y are given (as obtained from the STRIDE subroutine) for some three-dimensional arrays with n1 = n2 = n3 and for POWER3 with 64KB level 1 |cache:

|  N1,N2,N3    INC2Y    INC3Y
| 
|        30       30      900
|        32       32     1032
|        64       64     4112
|       120      120    14408
|       128      136    17416
|       240      240    57608
|       256      264    67592
|       420      420   176400

As mentioned before, the strides of the input array are not important. The array can be declared as a three-dimensional array. If you want to use the same array for input and output, the requirements are inc2x >= inc2y and inc3x >= inc3y. A simple thing to do is to use inc2x = inc2y and make inc3x a multiple of inc2x not smaller than inc3y. Then X can be declared as a three-dimensional array X(INC2X,INC3X/INC2X,N3).

Example 8--SRCFT3

This example shows the use of the STRIDE subroutine in computing three-dimensional transforms using the SRCFT3 subroutine.

             N       INCD    INCR    DT   IOPT
             |        |        |      |    |
CALL STRIDE( N2 , N1/2 + 1 , INC2Y , 'C' , 0  )

Here, the arguments refer to the SRCFT3 subroutine. Next, you should find a good value for inc3y. The minimum acceptable value for inc3y is (n2)(inc2y).

             N      INCD     INCR    DT   IOPT
             |       |         |      |    |
CALL STRIDE( N3 , N2*INC2Y , INC3Y , 'C' , 0  )

For the two-dimensional array, location (k1+k2*inc2y,k3)
For the one-dimensional array, location (k1+k2*inc2y+k3*inc3y)

|  N1,N2,N3   INC2Y      INC3Y
| 
|        30      16        488
|        32      17        552
|        64      33       2128
|       120      61       7320
|       128      65       8328
|       240     121      29064
|       256     129      33032
|       420    	 211      88620

As mentioned before, the strides of the input array are not important. The array can be declared as a three-dimensional array. If you want to use the same array for input and output, the requirements are inc2x >= 2(inc2y) and inc3x >= 2(inc3y). A simple thing to do is to use inc2x = 2(inc2y) and make inc3x a multiple of inc2x not smaller than 2(inc3y). Then X can be declared as a three-dimensional array X(INC2X,INC3X/INC2X,N3).

Example 9--SCRFT3

This example shows the use of the STRIDE subroutine in computing three-dimensional transforms using the SCRFT3 subroutine.

The STRIDE subroutine helps in picking good values of inc2y and inc3y. This requires two calls to the STRIDE subroutine as shown below. First, you should find a good value for inc2y. The minimum acceptable value for inc2y is n1+2.

             N     INCD    INCR    DT   IOPT
             |       |       |      |    |
CALL STRIDE( N2 , N1 + 2 , INC2Y , 'S' , 0  )

Here, the arguments refer to the SCRFT3 subroutine. Next, you should find a good value for inc3y. The minimum acceptable value for inc3y is (n2)(inc2y).

             N      INCD     INCR    DT   IOPT
             |        |        |      |    |
CALL STRIDE( N3 , N2*INC2Y , INC3Y , 'S' , 0  )

For the two-dimensional array, location (k1+k2*inc2y,k3)
For the one-dimensional array, location (k1+k2*inc2y+k3*inc3y)

|  N1,N2,N3    INC2Y     INC3Y
| 
|        30       32       976
|        32       34      1104
|        64       66      4256
|       120      122     14640
|       128      130     16656
|       240      242     58128
|       256      258     66064
|       420      422    177240

For this subroutine, the strides (inc2x and inc3x) of the input array are also important. In general, inc2x = inc2y/2 and inc3x = inc3y/2 are good choices. These are also the requirement if you want to use the same array for input and output.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]