Parallel Engineering and Scientific Subroutine Library for AIX Version 2 Release 3: Guide and Reference

Sample Thermal Diffusion Program

A Fortran 90 sample program is presented in this section, along with the command for processing it in a parallel processing environment. The sample program, solving a thermal diffusion problem, uses vectors and matrices distributed across a one-dimensional process grid and call Level 2 PBLAS, Eigensystem analysis, and Fourier transform subroutines.

A copy of this sample program, plus an equivalent C program, is provided with the Parallel ESSL product. For file names and installation procedures, see the Parallel ESSL Installation Memo.

Following is a table of contents for this section, along with a description of each section for the Fortran 90 sample program.

Table 119. Table of Contents for the Sample Thermal Diffusion Programs

Contents	Page
Thermal Diffusion Discussion Paper: A technical description of the problem to be solved.	"Thermal Diffusion Discussion Paper"
Program main: Finds the cooling rate for a specified set of points in an anisotropic rectangular beam, immersed in a constant heat bath with a temperature of zero.	"Program Main"
Module parameters: Defines system wide parameters and the index structure used to help map global indices to local indices.	"Module Parameters"
Module diffusion: Assigns problem parameters and initial data. Subroutine init_diffusion: Initializes the problem size, the number of output points, the functional form of the diffusion constant, and the initial temperature distribution. Subroutine init_temp: Returns the initial temperature of the bar at a particular point. Subroutine diff_coef: Returns the value of the thermal diffusion coefficient at an arbitrary point.	"Module Diffusion"
Module fourier: Represents both the diffusion operator and the temperature profile in a sine function basis. Subroutine get_diffusion_matrix: Obtains the matrix representation of the diffusion operator in a sine function basis. Subroutine expand_temp_profile: Obtains the expansion coefficients of the initial temperature profile in a sine function expansion. Subroutine factor_nodes: Obtains the powers of prime factorization of the number of nodes, failing if the factorization is not compatible with FFT supported transform lengths. Subroutine min_power2: Obtains the smallest number which is a power of 2 and greater than or equal to the input argument.	"Module Fourier"
Module scale: Initializes the communications and provides a few communication utility routines. Subroutine initialize_scale: Initializes BLACS and calculates a block size. Subroutine initialize_rarray: Allocates space for a real array and creates the associated descriptor array and index array. Subroutine initialize_carray: Allocates space for a complex array and creates the associated descriptor array and index array. Subroutine clocal_to_rglobal: Gathers the real parts of the local portions of the block-cyclically-distributed complex array to generate the corresponding global matrix. Subroutine rlocal_to_rglobal: Gathers the local portions of the block-cyclically-distributed real array to generate the corresponding global matrix.	"Module Scale"
Input Data: Sample input data in namelist format, used by subroutine init_diffusion in module diffusion.	"Input Data"
Output Data: Sample output data, based on the sample input data, issued at the end of program main.	"Output Data"
Makefile: The makefile used to build the thermal diffusion program.	"Makefile"
Run Script: The script file used to execute the thermal diffusion program.	"Run Script"

Thermal Diffusion Discussion Paper

The objective of the diffusion program is to solve for the temperature of a beam at any point and at any time, given an initial temperature distribution. The following assumptions are made concerning the beam and its properties:

The beam has the dimensions l_x, l_y, and l_z, where l_z >> l_x and l_z >> l_y.
The thermal diffusion coefficient is a function of only:

That is, the physical properties of the beam do not depend on the direction:

Note:
The bar over a variable indicates it has dimension; whereas, the unbarred form is dimensionless.
The beam is immersed in a zero degree heat bath.

The general diffusion equation is given by:

where:

Math Graphic

is the position-dependent diffusion coefficient. Equation 1 may be rewritten, ignoring the z dimension, using the following dimensionless variables:

as follows:

where D_r is a reference diffusion constant, and l_r is the beam dimension ratio l_x/l_y. This equation is subject to the initial and boundary conditions given by:

T(x, y, 0) = T₀(x, y)

T(0, y, t) = T(x, 0, t) = T(pi, y, t) = T(x, pi, t) = 0

In the program, the initial condition T₀(x, y), the diffusion coefficient D(x, y), and the ratio l_r are determined in the initialization subroutine init_diffusion. n_x and n_y, defined later, are also initialized here. Equation 2 is solved by representing the operators in a sine function basis and solving the resulting matrix equations. We begin by expanding T in sine functions, as follows:

The initial set of expansion coefficients a_kj(0) are determined from the initial temperature profile by:

where the orthogonality of the sine functions has been used. If we extend the range of T from pi to 2pi, as an odd function, equation 4 can be written in terms of a discrete Fourier transform:

a_kj(0) is calculated in the program in the subroutine expand_temp_profile, with the actual temperatures calculated in the function init_temp. The subroutine call to DCFT2 performs the Fourier transform, and the results are stored in the array A. Substitute equation 3 into equation 2, multiply by (2/pi)sin(kx)sin(jy) and integrate over x and y to obtain:

where:

and:

Note that sin(kx)sin(k'x) and sin(jy)sin(j'y) are even functions about pi. Therefore, if we define D(2pi-x, y) = D(x, y) and D(x, 2pi-y) = D(x, y), where 0 <= x <= pi and 0 <= y <= pi, the limits on the integral in equation 7 may be extended to 2pi:

Equation 9 may be further reduced to sums of Fourier transforms using the identities:

cos(kx)cos(k'x) = (1/2)(cos((k-k')x)+cos((k+k')x))

sin(kx)sin(k'x) = (1/2)(cos((k-k')x)-cos((k+k')x))

sin(kx) = (i/2)(e^ikx-e^-ikx)

cos(kx) = (1/2)(e^ikx+e^-ikx)

Substituting these identities into equation 9, we have:

where:

k⁺ = k+k'

k^- = |k-k'|

j⁺ = j+j'

j^- = |j-j'|

and where:

Equation 10 is the simplest form for the matrix elements of the diffusion operator, and these elements are calculated in the subroutine get_diffusion_matrix. The diffusion coefficient is evaluated with the function diff_coef. The Parallel ESSL Fourier transform subroutine PDCFT2, called in subroutine get_diffusion_matrix, is used to determine the following elements:

Math Graphic

which are stored in the array DF. Because DF is an array which is block cyclically distributed among the nodes and each node requires elements of DF not locally available, this array is collected to the global array DFG on each node. The array DFG is subsequently used to calculate the matrix F. Now that we have determined the matrix elements of equation 6, we must truncate it in order to solve it. This may be done by truncating the k summation at n_x and the j summation at n_y. The dual indices may be collapsed into a single index 1 by:

l = (j-1)n_x+k

The j and k indices can similarly be recovered from the 1 index by:

j = ((l-1)/n_x)+1

k = mod(l-1, n_x)+1

Rewriting the truncated equation 6, we have:

Equation 11 has the general solution:

where lambda is the eigenvalue vector and B is the matrix of eigenvectors of the matrix F. The eigenvectors B and eigenvalues lambda correspond to the arrays B and LAMBDA in the program. These eigenvalues and eigenvectors are determined with the Parallel ESSL subroutine PDSYEVX. The inner sum:

corresponds to the array AB, with ABT containing the extra factor of:

Also, a_l(t) is represented by the array X, which is reused for each solution time.

Each of these matrix multiplications are done using the Parallel ESSL subroutine PDGEMV. The final solution is obtained by summing over the expansion coefficients:

The final temperature array is TEMP in the program, with the indices into the array corresponding to specific values of x, y, and t.

Program Main

        program main
!
!   Purpose, to  find the cooling rate for a specified set of
!   points in an anisotropic rectangular beam, immersed in a constant
!   heat bath with a temperature of 0.
!
!   Routines called:
!     expand_temp_profile
!     get_diffusion_matrix
!     igamx2d
!     init_diffusion
!     initialize_rarray
!     initialize_scale
!     pdgemv
!     pdsyevx
!     rlocal_to_rglobal
!
        use parameters
        use scale
        use diffusion
        use fourier
        implicit none
 
        integer :: n, ix, jx, iy, jy, k, i, j, stat, it, ib, ig
        integer :: num_errors, lwork, ilwork
        integer, allocatable :: iwork(:)
        real(long), allocatable :: work(:)
!
!       a contains the sine expansion coefficients of the initial
!                temperature profile.
!       b contains the eigenvectors of the diffusion operator in the
!                sine function basis.
!       ab contains the initial temperature profile expanded in the
!                eigenvectors of the diffusion operator.
!       f contains the matrix elements of the diffusion operation in
!                sine function basis.
!       lambda contains the eigenvalues of the diffusion operator.
!
!       df contains the Fourier transform of the diffusion coefficient function.
!
        real(long), allocatable ::  lambda(:), xg(:,:)
        real(long), allocatable ::  gap(:)
        real(long) :: dum
 
        real(long), pointer :: f(:,:), b(:,:), a(:,:)
        type (g_index), pointer :: f_i, b_i, a_i
        integer :: f_d(DESC_DIM), b_d(DESC_DIM), a_d(DESC_DIM)
 
        real(long), pointer :: x(:,:), ab(:,:), abt(:,:)
        type (g_index), pointer :: x_i, ab_i, abt_i
        integer :: x_d(DESC_DIM), ab_d(DESC_DIM), abt_d(DESC_DIM)
 
        real(long), allocatable :: xsines(:,:), ysines(:,:), temp(:,:)
        integer, allocatable :: ifail(:), iclustr(:)
        integer :: biga_index, num_eigvalues, num_vectors, info
 
 
!
!   Read in the problem size, initialize the problem dimensions,
!   choose functional form for the spatially dependent heat diffusion
!   coefficients, choose functional form of initial temperature distribution
!   and choose the number of points, in both space and time, of the solution
!   to print out.
!
        call init_diffusion
        num_errors = 0
!
!       Read in how many sine functions to include in both the
!            x and y directions.
!
!       Because we need to get the Fourier coefficients of the sums
!       and differences of the indices, we need to include twice as
!       many Fourier coefficients as the number of sine expansion coefficients.
!       Also, because the top and bottom halves of the Fourier
!       transform are identical,
!       an artifact of artificially extending the diffusion coefficient
!       function and the initial temperature distribution, we need to
!       double the number of Fourier coefficients again. Finally, the
!       extra sum of one comes from the fact that the sine function
!       expansion starts a 1 and the Fourier expansion starts at 0.
!
 
!  n is the order of the diffusion operator equation.
        n = dif_nx * dif_ny
!
!  Initialize BLACS and calculate default block sizes.
!
        call initialize_scale(n, biga_index)
 
 
!
!
!   Allocate room for the eigenvalues of diffusion operator.
!
        allocate( lambda(n), stat=stat)
        if( stat .ne. 0) num_errors = num_errors + 1
 
!
!   Allocate room for sines of x and y coordinates.
!
        allocate( xsines(dif_npts, dif_nx) , stat=stat )
        if( stat .ne. 0) num_errors = num_errors + 1
 
        allocate( ysines(dif_npts, dif_ny) , stat=stat )
        if( stat .ne. 0) num_errors = num_errors + 1
 
!
!   Allocate room for temperature history at selected points.
!
        allocate( temp(dif_npts, dif_ntemps) , stat=stat )
        if( stat .ne. 0) num_errors = num_errors + 1
 
!
!   Allocate room for global temperature profile expansion vector at time t.
!
        allocate( xg(1, n) , stat=stat )
        if( stat .ne. 0) num_errors = num_errors + 1
 
        call igamx2d(sc_icontext,'A',' ',1,1,num_errors,1,-1,-1,-1,            &
     &               -1,-1)
        if( num_errors .gt. 0 ) then
           if( sc_iam .eq. 0 ) then
               write(*,*) 'Error in allocating arrays in main'
               call blacs_abort(sc_icontext, 1)
           endif
        endif
 
 
!
!      A call to expand_temp_profile returns the sine expansion
!      coefficients of the initial temperature profile.
!
!
!  Get matrices.
!
!  Diffusion operator matrix.
        call initialize_rarray(f, f_d, f_i, n, n, biga_index)
 
!  Eigenvectors of diffusion operator matrix.
        call initialize_rarray(b, b_d, b_i, n, n, biga_index)
 
!  Initial temperature profile, in row vector.
        call initialize_rarray(a, a_d, a_i, 1, n, biga_index)
 
!  Initial temperature profile, in eigenfunction basis, in row vector.
        call initialize_rarray(ab, ab_d, ab_i, 1, n, biga_index)
 
!  Temperature profile, at time t, in eigenfunction basis, in row vector.
        call initialize_rarray(abt, abt_d, abt_i, 1, n, biga_index)
 
!  Temperarure profile in at time t in sine expansion basis, in row vector.
        call initialize_rarray(x, x_d, x_i, 1, n, biga_index)
 
!
!  Represent initial temperature in sine function expansion.
!
        call expand_temp_profile(a,a_i,a_d)
!
!  Here, we are calculating the initial set of coefficients
!  in the sine function expansion as given in equations 3 and 4 of
!  the discussion paper
!
 
!
!       The call to get_diffusion_matrix returns the diffusion
!        operator in the sine function basis.
!
        call get_diffusion_matrix(f,f_i,f_d)
!
!   This last call determines the matrix elements defined by equation 10
!   in the discussion paper.
!
 
 
!
!    Here we precalculate the sine functions, sin(kx) and sin(jy) used
!    in equation 13 of the discussion paper.
!    These sine functions are only evaluated at the points x and y for
!    which the solution is evaluated.
!
 
        do i = 1, dif_nx
          do j = 1, dif_npts
            xsines(j,i) = sqrt(2.d0/pi) * sin( i * dif_x(j))
          enddo
        enddo
 
        do i = 1, dif_ny
          do j = 1, dif_npts
            ysines(j,i) = sqrt(2.d0/pi) * sin( i * dif_y(j))
          enddo
        enddo
 
 
!
!  Allocate arrays for eigenvalue decomposition.
!
        allocate( ifail(n), stat=stat)
        if( stat .ne. 0) num_errors = num_errors + 1
        allocate( iclustr(n), stat=stat)
        if( stat .ne. 0) num_errors = num_errors + 1
        allocate( gap(sc_nnodes), stat=stat)
        if( stat .ne. 0) num_errors = num_errors + 1
 
!
!  Allocate scratch space for the symetric eigenvector solver.
!
        lwork = 20*n  + max( 5*n, n*(n+ sc_nnodes-1)/sc_nnodes )               &
     &         + n*( (n+ sc_nnodes-1)/sc_nnodes) + 2*f_d(MB_) * f_d(MB_)
 
        ilwork = 2*n + max((3*n+1+sc_nnodes),max(4*n,14))
 
        allocate( work(lwork) , stat=stat )
        if( stat .ne. 0) num_errors = num_errors + 1
        allocate( iwork(ilwork) , stat=stat )
        if( stat .ne. 0) num_errors = num_errors + 1
 
!
!   Test to see if we had any allocation errors.
!
 
        call igamx2d(sc_icontext,'A',' ',1,1,num_errors,1,-1,-1,-1,            &
     &               -1,-1)
        if( num_errors .gt. 0 ) then
           if( sc_iam .eq. 0 ) then
               write(*,*) 'Error in allocating arrays for pdsyevx in ',        &
     &                     'main'
               call blacs_abort(sc_icontext, 1)
           endif
        endif
 
        do i = 1, sc_nnodes
           gap(i) = 0.d0
        enddo
        do i = 1, n
           ifail(i) = 0
           iclustr(i) = 0
        enddo
!
! The call to pdsyevx will find both the eigenvalues and eigenvectors
! of the diffusion matrix operator f. The eigenvalues will be stored in
! the vector lambda and the corresponding eigenvectors will be stored in
! the matrix b.  The f and b matrices in the program correspond to the
! F and B matrices in equations 11 and 12 in the
! discussion paper.
!
!
        call pdsyevx('V','A','U',n,f,1,1,f_d,-1.d30,1.d30,0,n,                 &
     &        0.d0,num_eigvalues,num_vectors,lambda,1.d-5,b,1,1,b_d,           &
     &        work, lwork, iwork, ilwork, ifail, iclustr, gap, info)
 
        if( sc_iam .eq. 0) then
          if( info .ne. 0 ) then
             write(*,*) ' info is ', info
             call blacs_abort(sc_icontext, 1)
          endif
        endif
!
!       Multiply the transpose of the eigenvector matrix, b, with the sine
!       expansion of the initial temperature profile, a, to obtain the
!       initial temperature profile in terms of the eigenfuctions of the
!       diffusion operator.
!
        call pdgemv('T', n, n, 1.d0, b, 1, 1, b_d, a, 1, 1, a_d, 1,            &
     &              0.d0, ab, 1, 1, ab_d, 1)
!
!  This first matrix multiplication, yielding the matrix ab,
!  corresponds to the inner summation in equation 10
!  of the discussion paper.
!
 
!
!    Calculate temperature profile for each time step.
!
        do it = 1, dif_ntemps
          i = 0
          do ib = 1, ab_i%num_col_blks
            do ig = ab_i%scb(ib), ab_i%ecb(ib)
              i = i + 1
              abt(1,i) = exp( -lambda(ig) * it * dif_delta_t) * ab(1,i)
            enddo
          enddo
!
!   abt now has the expansion of the temperature profile in terms of the
!   eigenvectors of the diffusion operator.
!
 
!
!   Multiply the eigenvector matrix with abt to give the sine function
!   expansion of the temperature profile at time t, x.
!
          call pdgemv('N', n, n, 1.d0, b, 1, 1, b_d, abt, 1, 1, abt_d,         &
     &              1, 0.d0, x, 1, 1, x_d, 1)
!  This last sum corresponds to the outer sum of equation 12, where the
!  time dependent expansion coefficients a{sub l}(t) are stored in the
!  temporary array x in the program.
!
 
!
!   Gather all of the local pieces of the array x to the array xg.
!
          call rlocal_to_rglobal(x, x_d, xg )
 
          do k = 1, dif_npts
            temp(k, it) = 0.d0
          enddo
 
          do iy = 1, dif_ny
            do ix = 1, dif_nx
              i = (iy -1) * dif_nx + ix
              do k = 1, dif_npts
                temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy)          &
     &                       * xg(1,i)
              enddo
            enddo
          enddo
!
!  This last do loop corresponds to the double summation in equation
!  13 of the discussion paper.
!
 
        enddo  ! end of time loop
 
 
!
!   Here, we are just writing out the temperatures at the selected times
!   and points.
!
        if( sc_iam .eq. 0 ) then    ! if I am node 0
           write(*,*) '   point #      X         Y'
           do i = 1, dif_npts
             write(*,'(2x, i6, 2x, 2f11.4)') i, dif_x(i), dif_y(i)
           enddo
           write(*,*)
           do k = 1, dif_npts, 6
             write(*,*)
             write(*,'(30X,'Points'')')
             write(*,'(''   TIME   '',6(5x,''#'', i4))') (i, i=k, k+5)
             do i = 1, dif_ntemps
                write(*,'(7f10.5)') i*dif_delta_t,                             &
     &                          (temp(j,i),j=k,min(k+5,dif_npts))
             enddo
          enddo
        endif
 
        deallocate(xg)
        deallocate(xsines)
        deallocate(ysines)
        deallocate(lambda)
        deallocate(temp)
        deallocate( ifail)
        deallocate( iclustr)
        deallocate( gap)
        stop
        end

Module Parameters

      module parameters
!
!   Purpose: Define system wide parameters and index structure
!         used to help map global indices to local indices.
!
        implicit none
        public
        integer, parameter :: long=8, short=4
        real(long), parameter :: pi = 3.141592653589793d0
        real(long), parameter :: twopi = 2.d0*pi
 
        type g_index
         integer :: num_row_blks, num_col_blks
         integer, pointer :: srb(:), scb(:), erb(:), ecb(:)
        end type g_index
!
!  The g_index type was created for convenience
!     components:
!        num_row_blks: number of block repetitions over matrix rows.
!        num_col_blks: number of block repetitions over matrix columns.
!        srb: global row index at start of a block
!             corresponding local index is ( block # -1) * mb
!             where mb is the number of rows in the block.
!
!        scb: global column index at start of a block
!             corresponding local index is ( block # -1) * nb
!             where nb is the number of columns in the block.
!
!        erb: last global row index in the block.
!        ecb: last global column index in the block.
        public g_index
 
      end module parameters

Module Diffusion

      module diffusion
!
!   Purpose: Assign problem parameters and initial data.
!
!   Routines called:
!       none
!
      use parameters
      use scale
      implicit none
      private
!
! Make all entities private by default.
! Have all public entities have the prefix dif_.
!
!  The following are the publicly available routines.
!
      public init_diffusion, init_temp, diff_coef
 
!
!     The following are publicly available variables.
!
 
      real, public :: dif_ly_ratio
      integer, public :: dif_nx, dif_ny, dif_npts, dif_ntemps
      real(long), public :: dif_delta_t
      real(long), public, allocatable :: dif_x(:), dif_y(:)
 
!
!     dif_ly_ratio  is the ratio of the x and y lengths of the beam.
!     dif_nx        is the number of sine expansion coefficients to use
!                   in the x direction.
!     dif_ny        is the number of sine expansion coefficients to use
!                   in the y direction.
!     dif_delta_t   is the size of the time step to be display on output.
!     dif_ntemps    is the total number of temperatures to display per point.
!     dif_npts      is the total number of points to output.
!     dif_x         is the x coordinates of the points.
!     dif_y         is the y coordinates of the points.
!
!
 
!
!     Private variables
!
      integer :: init_f=1, diff_f=1
!   init_f chooses the functional form of initial distribution of temperature.
!   diff_f chooses the functional form for spatially dependent head diffusion
!          coefficient.
 
      contains
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine init_diffusion                                          *!
!*                                                                          *!
!*   Purpose: Initialize problem size, number of output point and           *!
!*            functional form of diffusion constant and initial temperature *!
!*            distribution                                                  *!
!*                                                                          *!
!****************************************************************************!
        subroutine init_diffusion
        namelist /input/ ly_ratio, delta_t, numx, numy, nx, ny, numt,         &
     &                    init_f, diff_f
        integer :: numx=5, numy=5, nx=7, ny=7, numt=20
        real(long) :: ly_ratio=1.d0, delta_t=0.1
        real(long) :: delx, dely
        integer :: i, j, ij
        logical :: ex
!==============================================================================!
!          Start of executable code                                            !
 
        inquire ( file='diffus.naml', exist=ex)
        if( ex ) then
          open( 10, file='diffus.naml', action='read')
          read( 10, input)
          close(10)
        endif
 
        dif_ly_ratio = ly_ratio
        dif_npts = numx*numy
        dif_delta_t = delta_t
        dif_ntemps  = numt
        dif_nx = nx
        dif_ny = ny
        allocate( dif_x(numx*numy) )
        allocate( dif_y(numy*numx) )
!
! Assign a simple linear array of points.
!
        delx = PI/ ( numx + 1.d0)
        dely = PI/ ( numy + 1.d0)
        do i = 1, numx
          do j = 1, numy
            ij = numx*(j-1)  + i
            dif_x(ij) = delx* i
            dif_y(ij) = dely * j
          enddo
        enddo
        return
        end subroutine init_diffusion
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine init_temp                                               *!
!*                                                                          *!
!*   Purpose: Return the initial temperature of the bar at a particular     *!
!*            point                                                         *!
!*                                                                          *!
!****************************************************************************!
        function init_temp(x, y)
!
!     Arguments:
!       x: real*8 (in), x coordinate
!       y: real*8 (in), y coordinate
!     Function return:
!     init_temp: real*8 (out), initial temperature at (x,y)
!
        real(long), intent(in) :: x, y
        real(long) :: init_temp
 
!
!   The problem has been scaled to go from 0 to pi in both the x
!   and y directions. To calculate the expansion coefficients, we
!   define the function to be odd about pi and use the range 0 < x < 2*pi
!
!  Local variables.
        integer :: isign
        real(long) :: x1, y1
!
        isign = 1
        x1 = x
        if( x .gt. pi ) then
          isign = -isign
          x1 = twopi - x
        endif
        y1 = y
        if( y .gt. pi ) then
          isign = -isign
          y1 = twopi - y
        endif
 
!
! Choose very simple temperature profile cases.
!
        select case (init_f)
          case (1)
             init_temp = isign*(x1*(pi-x1))*y1*(pi-y1)
          case (2)
             init_temp = isign*(x1*(pi-x1))*y1*(pi-y1)*y1
          case (3)
             init_temp = isign*(x1*(pi-x1))*y1*(pi-y1)*x1
          case (4)
             init_temp = isign*(x1*(pi-x1))*y1*(pi-y1)*x1*y1
          case (5)
             init_temp = isign*(x1*(pi-x1))*y1*(pi-y1)
          case (6)
             init_temp = isign*(x1*(pi-x1))**2 *y1*(pi-y1)
          case (7)
             init_temp = isign*(x1*(pi-x1))*(y1*(pi-y1))**2
          case default
             init_temp = isign*sin(x1)*sin(y1)
        end select
        return
        end function init_temp
 
 
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine diff_coef                                               *!
!*                                                                          *!
!*   Purpose: Return the value of the thermal diffusion coefficient at      *!
!*            an arbitrary point                                            *!
!*                                                                          *!
!****************************************************************************!
        function diff_coef(x, y)
!     Arguments:
!       x: real*8 (in), x coordinate
!       y: real*8 (in), y coordinate
!     Function return:
!       diff_coef: real*8 (out), diffusion coefficient at (x,y)
!
         real(long), intent(in) :: x, y
         real(long) :: diff_coef
 
!
!   The problem has been scaled to go from 0 to pi in both the x
!   and y directions. To simplify the matrix element calculations,
!   we define the function to be even about pi.
!
 
!  Local variables.
        real(long) :: x1, y1
 
!==============================================================================!
!          Start of executable code.                                           !
        x1 = x
        if( x .gt. pi )  x1 = twopi - x
        y1 = y
        if( y .gt. pi ) y1 = twopi - y
 
!
! Choose very simple diffusion coefficient cases.
!
        select case (diff_f)
           case (1)
             diff_coef = .5d0 + (x1 + y1) / (2 * twopi)
           case (2)
             diff_coef = ((1.d0 + x1)*(pi - x1 + 1.d0)*(y1 + pi))/ 3*pi
           case (3)
             diff_coef = (y1 + pi) * pi/((pi + x1) * (2* pi - x1))
           case default
             diff_coef = 1.d0
           end select
        return
        end function diff_coef
 
 
      end module diffusion

Module Fourier

      module fourier
!
!   Purpose: To represent both the diffusion operator and
!            the temperature profile in a sine function basis.
!
!   Routines called:
!     blacs_abort
!     clocal_to_rglobal
!     dcft2
!     igamx2d
!     initialize_carray
!     initialize_scale
!     pdcft2
!
      use parameters
      use scale
      use diffusion
      implicit none
      private
!
!     all entities private by default
!
      external pdcft2
!
!   publicly available routines
!
      public expand_temp_profile, get_diffusion_matrix
      public g_index
!
!   publicly available variables
!
 
!
!   private variables
!
      integer :: pn_fac(5) = 5*0  ! prime factors of number of nodes
!     nnodes = 2**pn_fac(1) * 3**pn_fac(2) * 5**pn_fac(3) *
!              7**pn_fac(4) * 11**pn_fac(5)
 
!
!   private routines
!
      private minpower2, factor_nodes
      contains
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine get_diffusion_matrix                                    *!
!*                                                                          *!
!*   Purpose: To obtain the matrix representation of the diffusion          *!
!*            operator in a sine function basis                             *!
!*                                                                          *!
!****************************************************************************!
        subroutine get_diffusion_matrix(f,f_i, f_d)
!
!   Arguments:
!     f: real*8,dimension(:,:),(out), local part of the global matrix
!               containing the diffusion operator in sine function basis.
!     f_d: integer*4, dimension(:),(in), array descriptor for f.
!     f_i: g_type, (in), g_type structure for f, see parameter.f.
!
        real(long), intent(out) :: f(:,:)
        integer, intent(in) :: f_d(DESC_DIM)
        type (g_index), intent(in) :: f_i
 
! Local variables
!
!  df contains the diffusion coefficient before the call to pdcft2.
!  df contains the Fourier transpose of diffusion coefficients after the call.
!  dfg contains the entire Fourier transpose of df on each node.
!
        complex(long), pointer :: df(:,:)
        type (g_index), pointer :: df_i
        integer :: df_d(DESC_DIM)
 
        real(long), allocatable :: dfg(:,:)
!
!  ixi and iyi are arrays which, given a global index,
!  return the x and y offsets.  Recall that the large arrays
!  are 4 dimensional arrays collapsed into 2 dimensions,
!  where i = (ix-1)*dif_ny + iy.
!
        integer, allocatable :: ixi(:), iyi(:)
        real(long) :: scale_f
        integer :: nx, ny, ix, iy, ixp, iyp, istat, nerrs
        integer :: ixdiff, iydiff, num_errors
        integer :: naux1, naux2, i, j, factor1, factor2, idum
        integer :: ib, jb, il, jl
!
!  ip is a support array for pdcft2.
!
        integer :: ip(40)
        integer :: blk_index
!
!   Fourier transform of diffusion coefficient function
        nerrs=0
 
        call factor_nodes()
        factor1 = 3**pn_fac(2) * 5**pn_fac(3) * 7**pn_fac(4) *                 &
     &                           11**pn_fac(5)
!
!  Here we are trying to find the smallest number which is evenly
!  divisible by the number of processes and is larger than 4*(n+1).
!
        factor2 = (4*(dif_nx+1) + factor1 -1)/factor1
        nx = minpower2( factor2,idum) * factor1
 
        factor2 = (4*(dif_ny+1) + factor1 -1)/factor1
        ny = minpower2( factor2,idum) * factor1
 
        scale_f = 1.d0/ real(nx*ny, long)
 
!
!  Get storage for diffusion array.
!
        call  initialize_scale(ny, blk_index)
        call  initialize_carray(df, df_d, df_i, nx, ny,                        &
     &                          blk_index)
!
!    Here, we initialize the local part of the global array df, which
!    contains the value of the diffusion coefficient function, evenly
!    evaluated between (0, 2*pi).  We do a two dimensional Fourier
!    transform on the data.  Because the size of this array is so small,
!    nx by ny, and ultimately we have to transfer the whole array to
!    each node, it would probably be more efficient to do the calculation
!    locally on each node.
!
 
!    Get the value of the diffusion coefficient function at
!    the necessary points.
!
 
!
!   This loop can be simplified considerably. Because blocks of the
!   array are column-distributed with the block size equal to the number
!   of columns divided by the number of processes, there is only a single
!   column block.  Also, because the processes are distributed in a 1 x np
!   arrangement, the local row index will equal the global row index.
!   However, the loop is perfectly general for other process arrangements
!   and is correct for this particular case.
!
        jl = 0
        do jb = 1, df_i%num_col_blks   ! loop over the number of column blocks
          do j = df_i%scb(jb), df_i%ecb(jb)
                           ! loop over columns in block
                           ! j is a global index
            jl = jl + 1    ! jl is local array column index
            il = 0
            do ib = 1, df_i%num_row_blks   ! loop over the number of row blocks
              do i = df_i%srb(ib),  df_i%erb(ib)
                           ! loop over rows in block
                           ! i is a global index
                il = il + 1    ! il is local array row index
                df(il,jl) = diff_coef((twopi*(i-1))/nx,                        &
     &                                   (twopi*(j-1))/ny)
              enddo
            enddo
          enddo
        enddo
!
!  This last loop just determined the diffusion coefficient at evenly
!  spaced points along the x and y coordinates.
!
 
!
!
!  Do the Fourier transform.
!
        do i= 1, 40
          ip(i) = 0
        enddo
!  Store the array in normal mode overwriting the original array.
        ip(1) = 1
        ip(2) = 1
 
!
! Because the size of the 2d Fourier transform is nx by ny, which is much
! smaller than the size of the eigenvalue problem, this could probably
! be done serially on each node more quickly.
!
        call pdcft2(df, df, nx, ny, 1,scale_f, sc_icontext, ip)
!
!
!   df now has the Fourier coefficients for the diffusion coefficient
!      function, which correspond to the D(tilde)(sub ij) given in the
!      discussion paper.
!
!   Because each process will need most of the Fourier transformed diffusion
!   coefficients, it is useful to broadcast all parts of this matrix
!   to each process.
!
!  First allocate the index arrays.
!
        num_errors=0
        allocate(ixi(dif_nx*dif_ny), stat=istat)
        if( istat .ne. 0 ) num_errors = num_errors + 1
        allocate(iyi(dif_nx*dif_ny), stat=istat)
        if( istat .ne. 0 ) num_errors = num_errors + 1
!  Allocate array for holding global Fourier transform.
        allocate(dfg(nx,ny), stat = istat)
        if( istat .ne. 0 ) num_errors = num_errors + 1
 
        call igamx2d(sc_icontext,'A',' ',1,1,num_errors,1,-1,-1,-1,            &
     &               -1,-1)
        if( num_errors .gt. 0 ) then
           if( sc_iam .eq. 0 ) then
               write(*,*) 'Error in allocating scratch arrays in ',            &
     &                     'get_diffusion_matrix'
               call blacs_abort(sc_icontext, 1)
           endif
        endif
 
 
        call clocal_to_rglobal( df, df_d, dfg )
 
!   Here df contains only local portions of the global array, while
!   dfg contains the entire global array.
!
 
!
!  Now load up the diffusion operator
!    f(ix,iy;ix',iy').
!
!    Here we transform the 4d matrix into the 2d matrix where
!       i = (iy-1)* dif_nx + ix + 1
!    and
!       j = (iy'-1)* dif_nx + ix' + 1.
!
!  First calculate the index arrays.
!
        do ix = 1, dif_nx
          do iy = 1, dif_ny
            i = (iy-1)* dif_nx + ix
            ixi(i) = ix
            iyi(i) = iy
          enddo
        enddo
 
!
! This final loop loads the matrix elements up for F as given in
! equation 10.
!
        jl = 0
        do jb = 1, f_i%num_col_blks   ! loop over the number of column blocks
          do j = f_i%scb(jb), f_i%ecb(jb)
                           ! loop over columns in block
                           ! j is a global index
            jl = jl + 1    ! jl is local array column index
            iyp = iyi(j)
            ixp = ixi(j)
            il = 0
            do ib = 1, f_i%num_row_blks   ! loop over the number of row blocks
              do i = f_i%srb(ib), f_i%erb(ib)
                           ! loop over rows in block
                           ! i is a global index
                il = il + 1    ! il is local array row index
                iy = iyi(i)
                ix = ixi(i)
                ixdiff = iabs(ix-ixp) + 1
                iydiff = iabs(iy-iyp) + 1
                f(il,jl) = ( ( ix*ixp + iy*iyp*dif_ly_ratio ) *                &
     &              (dfg(ixdiff, iydiff)    - dfg(ix+ixp+1,iy+iyp+1))          &
     &              +     ( ix*ixp - iy*iyp*dif_ly_ratio ) *                   &
     &              (dfg(ix+ixp+1,iydiff )  - dfg(ixdiff,iy+iyp+1)))
              enddo
            enddo
          enddo
        enddo
        deallocate(dfg)
        deallocate(ixi)
        deallocate(iyi)
!
!   We should add routines to free df.
!
        return
        end subroutine get_diffusion_matrix
 
 
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine expand_temp_profile                                     *!
!*                                                                          *!
!*   Purpose: To obtain the expansion coefficients of the initial           *!
!*            temperature profile in a sine function expansion              *!
!*                                                                          *!
!****************************************************************************!
        subroutine expand_temp_profile(a,a_i,a_d)
!
!   Arguments:
!     a: real*8,dimension(:,:),(out), local part of the global matrix,
!                  containing the sine coefficients for initial
!                  temperature distribution.
!     a_d: integer*4, dimension(:),(in), array descriptor for a.
!     a_i: g_type, (in), g_type structure for a, see parameter.f.
!
        real(long), intent(out) :: a(:,:)
        integer, intent(in) :: a_d(DESC_DIM)
        type (g_index), intent(in) :: a_i
 
! Local variables
        complex(long), allocatable :: atmp(:,:)
        real(long), allocatable :: aux1(:), aux2(:)
        integer :: i,j, nx, ny, istat, naux1, naux2, nerrs, jl
        integer :: idum, jb, jx, jy
        real(long) :: x, y, scale_f
 
!
!      Calculate the minimum power of 2 to hold twice the number of
!      Fourier coefficients as sine coefficients. The top half of the
!      Fourier coefficients will equal minus the bottom half because
!      we are forcing the temperature profile to be odd.
!
        nx = minpower2( 2*(dif_nx+1), idum)
        ny = minpower2( 2*(dif_ny+1), idum)
        scale_f = -twopi / real( nx*ny,long)
 
        nerrs = 0
!
!       Temperature profile allocation.
        allocate(atmp(nx,ny), stat=istat )
        if( istat .ne. 0 ) nerrs = nerrs + 1
!
!
        naux1 = 40000 + 2.28*( nx + ny)
        naux2 = 20000 + 66*( 256 + 2*max(nx , ny))
!
!  Allocate work storage.
        allocate(aux1(naux1), stat=istat)
        if( istat .ne. 0 ) nerrs = nerrs + 1
        allocate(aux2(naux2), stat=istat)
        if( istat .ne. 0 ) nerrs = nerrs + 1
 
!
!   Check for allocation errors.
!
        call igamx2d(sc_icontext,'A',' ',1,1,nerrs,1,-1,-1,-1,-1,-1)
        if( nerrs .gt. 0 ) then
           if( sc_iam .eq. 0 ) then
               write(*,*) 'Error in allocating scratch arrays in ',            &
     &                     'expand_temp_profile'
               call blacs_abort(sc_icontext, 1)
           endif
        endif
 
!
!
!       Fill atmp with the initial temperatures.
!
!  atmp contains the initial temperature profile T(sub 0)(x,y) as used
!  in equation 5 in the discussion paper.
!
        do i = 1, nx
           do j = 1, ny
             atmp(i,j) = init_temp((twopi*(i-1))/nx, (twopi*(j-1))/ny)
           enddo
        enddo
 
!
!  Do the 2d Fourier transform of atmp.
!
!  First initialize.
!
!  The 2d Fourier transform can be done in parallel, however it
!  is such a small part of the problem, it is probably faster to do
!  it serially on each node.
!
!  Note that we could have used DSINF to obtain these expansion coefficients
!  as well.
!
        call dcft2(1,atmp,1,nx,atmp,1,nx,nx,ny,1,scale_f ,aux1,naux1,          &
     &               aux2,naux2 )
        call dcft2(0,atmp,1,nx,atmp,1,nx,nx,ny,1,scale_f ,aux1,naux1,          &
     &               aux2,naux2 )
!
! The calls to dcft2 calculated the dual Fourier transform as
! defined by equation 5 in the discussion paper.
!
 
!
!  This final loop is to load only those portions of the global array
!  corresponding to the local portion of that array for this process.
!
        jl = 0
        do jb =  1, a_i%num_col_blks    ! loop over all column blocks
          do j = a_i%scb(jb),  a_i%ecb(jb)  ! j is global index
            jx = mod(j-1, dif_nx) + 2
            jy = (j-1) / dif_nx + 2
            jl = jl + 1
            a(1,jl) = real(atmp(jx,jy),long)
          enddo
        enddo
 
        deallocate(atmp)
        deallocate(aux1)
        deallocate(aux2)
        return
        end subroutine expand_temp_profile
 
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine factor_nodes                                            *!
!*                                                                          *!
!*   Purpose: To obtain the powers of prime factorization of the number     *!
!*            nodes, failing if the factorization is not compatible with    *!
!*            FFT supported transform lengths                               *!
!*                                                                          *!
!****************************************************************************!
        subroutine factor_nodes()
!  Arguments: None
!
!  Local variables
        integer n1, n2, l2
!
!   Determine the prime factorization of nnodes, which must
!       be of the form 2**n * 3**m * 5**i * 7**j * 11**k
!       where m cannot be greater than 2 and i, j, and k cannot
!       be greater than 1
!
        n2 = sc_nnodes
        n1 = n2/11
        if( n1*11 .eq. n2) then
            pn_fac(5) = 1
            n2 = n1
        endif
 
        n1 = n2/7
        if( n1*7 .eq. n2) then
            pn_fac(4) = 1
            n2 = n1
        endif
 
        n1 = n2/5
        if( n1*5 .eq. n2) then
            pn_fac(3) = 1
            n2 = n1
        endif
 
        n1 = n2/3
        if( n1*3 .eq. n2) then
          if ( (n1/3)*3 .eq. n1 ) then
            pn_fac(2) = 2
            n2 = n1/3
          else
            pn_fac(2) = 1
            n2 = n1
          endif
        endif
 
        n1 = minpower2(n2,l2)
        pn_fac(1) = l2
 
        if( n1 .ne. n2) then
          if( sc_iam .eq. 0) then
            write(*,*) 'Invalid number of nodes, it must have the form:'
            write(*,*) '2**n * 3**m * 5**i * 7**j * 11**k, where '
            write(*,*) ' n >= 0, 0<=m<=2 and 0<= i,j,k <=1 '
            write(*,*) ' choose a power of 2 for best performance'
            call blacs_abort(sc_icontext,1)
          endif
        endif
        end subroutine factor_nodes
 
 
!****************************************************************************!
!*                                                                          *!
!*   Module function minpower2                                              *!
!*                                                                          *!
!*   Purpose: To obtain the smallest number which is a power of 2 and       *!
!*            greater than or equal to the input argument                   *!
!*                                                                          *!
!****************************************************************************!
        function minpower2( n, log2n )
!
!   Arguments:
!      n: integer*4, (in), input number
!      log2n: integer*4, (out), log base 2 of the function result
!   Function return:
!      minpower2: integer*4 (out), smallest number, which is a power of 2
!                 and greater than or equal to n.
!
        integer n, minpower2, log2n
!
!   Local variables.
        integer m, i
        m=n
 
        if( n < 0 ) write(*,*) 'n cannot be negative'
        powerloop: do i= 1, bit_size(n)
          m = m / 2
          if( m .eq. 0 ) exit powerloop
        enddo powerloop
        if ( 2**(i-1) .ne. n ) then
           if( 2**i < 0) write(*,*) 'n too large'
           log2n = i
           minpower2 = 2**i
        else
           log2n = i-1
           minpower2 = n
        endif
        return
        end function minpower2
 
      end module fourier

Module Scale

      module scale
!
!  Purpose: To initialize the communications and provide a few
!           communication utility routines.
!
!  Routines called:
!    blacs_abort
!    blacs_get
!    blacs_gridinfo
!    blacs_gridinit
!    blacs_pinfo
!    dgebr2d
!    dgebs2d
!    numroc
!
        use parameters
        implicit none
        external numroc
!
!  All variables private by default.
!
        private
!
!   Publicly accessible routines follow.
!
        public initialize_scale
        public initialize_rarray, initialize_carray
        public clocal_to_rglobal, rlocal_to_rglobal
        public g_index
 
!
!   Public variables follow.
!
        integer, public :: sc_icontext, sc_iam, sc_nnodes
 
!
!   Private variables follow.
!
!   MAXBLOCK is the maximum block size. Here it is set to a very
!            large number to force an equal noncyclic block distribution
!            for the FFTs.
!
!
        integer, public, parameter :: MAXBLOCK=50000
        integer, public, parameter :: MAX_SC_INDEX=10
        integer, public, parameter :: DESC_DIM=9
        integer, public, parameter :: DTYPE_=1
        integer, public, parameter :: CTXT_=2
        integer, public, parameter :: M_=3
        integer, public, parameter :: N_=4
        integer, public, parameter :: MB_=5
        integer, public, parameter :: NB_=6
        integer, public, parameter :: RSRC_=7
        integer, public, parameter :: CSRC_=8
        integer, public, parameter :: LLD_=9
        integer :: numroc
        integer :: nprow, npcol, myrow, mycol
!
!  The block sizes for a given array size are indexed in the
!  nblock, mblock and nsize arrays so that, for a given array size,
!  the block size calculation is done only once and exactly the same
!  block sizes are returned for any particular array size.
!
        integer :: nblock(MAX_SC_INDEX), mblock(MAX_SC_INDEX)
        integer :: nsize(MAX_SC_INDEX)
!
!  The number of different array sizes is limited by the fixed
!  dimension of the arrays nblock, mblock and nsize.
!
        integer :: icontext, iam, nnodes
        integer, save :: sc_indx=0
        integer, parameter :: rsrc=0, csrc=0
        logical, save :: initialized = .false.
!
!  All of the manipulation of the PESSL and SP variables will be
!  done in this module and held privately.
!
      contains
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine initialize_scale                                        *!
!*                                                                          *!
!*   Purpose: Initialize blacs and calculate a block size                   *!
!*                                                                          *!
!****************************************************************************!
        subroutine initialize_scale (n, index)
!
!     Arguments:
!       n: integer*4 (in), matrix or vector size.
!       index: integer*4 (out), index into an array of block sizes.
!              Provides a mechanism for creating similar descriptor arrays.
!
 
!
!  This routine assigns the block size based on a given
!  n and returns an index, so that multiple arrays can be created with
!  compatible block sizes.
!
        integer n, index
        integer  npc, npr, i
 
        if ( .not. initialized ) then
          call blacs_pinfo( iam, nnodes )
          sc_iam = iam
          sc_nnodes = nnodes
!
!      Get the number of nodes.
!
 
!
!   The Fourier transform routines require that the processors
!   be laid out in a 1 by nnodes arrangement.
!
          nprow = 1
          npcol = nnodes
 
          call blacs_get(0,0,icontext)
          sc_icontext = icontext
          call blacs_gridinit( icontext, 'R', nprow, npcol)
          sc_icontext = icontext
          call blacs_gridinfo( icontext, npr, npc, myrow, mycol)
!
!      Check that the system is gridded as expected.
!
          if( npr .ne. nprow .or. npc .ne. npcol) then
            if( iam .eq. 0) then
              write(*,*) 'number of processor rows and columns'
              write(*,*) 'incorrect ', nprow, npr, npcol, npc
              call blacs_abort(icontext,1)
            endif
          endif
          initialized = .true.
        endif
 
!
!  If we have already calculated a block size based on estimated
!  array size, return the index for that block size.
!
        do i = 1, sc_indx
           if( n .eq. nsize(i)) then
              index = i
              return
           endif
        enddo
 
!
!     Compute a block size.
!
        sc_indx = sc_indx + 1
        if ( sc_indx .GT. MAX_SC_INDEX ) then
          if( iam .eq. 0 ) then
            write(*,*) 'Used more than the maximum number of '
            write(*,*) 'indices in initialize_scale, Maximum is ',             &
     &                  MAX_SC_INDEX
            call blacs_abort(icontext,1)
          endif
        endif
 
        index = sc_indx
        nsize(index) = n
 
!
!  Always choose a square block with a maximum dimension of MAXBLOCK.
!
        if( ( n ) / nnodes .gt. MAXBLOCK ) then
            mblock(index) = MAXBLOCK
        else
            mblock(index) = (n ) / nnodes
        endif
        if( mblock(index) .lt. 1 ) then
            if( iam .eq. 0 ) then
              write(*,*) 'problem size too small for number of nodes'
              write(*,*) 'try increasing the nx and ny'
              write(*,*) n, nnodes
              call blacs_abort(sc_icontext, 1)
            endif
        endif
        nblock(index) = mblock(index)
 
        return
        end subroutine initialize_scale
 
 
!
!   This routine is provided to allocate dynamically the space
!   needed for the local part of a global array and initialize the
!   associated array descriptor.  It also returns array-useful
!   indices to do local to global index conversion.
!
!  initialize_rarray => real array initialization.
!  initialize_carray => complex array initialization.
!
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine initialize_rarray                                       *!
!*                                                                          *!
!*   Purpose: Allocate space for a real array and create associated         *!
!*            descriptor array and index array                              *!
!*                                                                          *!
!****************************************************************************!
        subroutine initialize_rarray( array, desc_array,                       &
     &                               index_array, m, n, blk_index)
!
!    Arguments:
!       array: pointer to real*8 (out), pointer to real array, initialized.
!       desc_array: integer*4 (out), empty descriptor array, initialized.
!       index_array: g_index (out), pointer to g_index structure,
!                   see parameter.f, initialized.
!       m: integer*4 (out), scalar number of rows in global array.
!       n: integer*4 (out), scalar number of columns in global array.
!       blk_index: integer*4 (in), index into array of block sizes to use
!                   for initializing the descriptor array.
!
        integer, intent(out) :: desc_array(DESC_DIM)
        integer, intent(in) :: m, n, blk_index
        type (g_index), pointer :: index_array
        real(long), pointer :: array(:,:)
 
! Local variables
        integer :: irows, icols, istat, i, j
        integer :: start_row_block, start_col_block
        integer :: mb, nb, num_mb, num_nb
 
!
!   Check to see if the block sizes were already calculated.
!
        if ( blk_index .lt.1 .or. blk_index .gt. sc_indx ) then
           if( iam .eq. 0 ) then
              write(*,*) 'No initialization done for index ', blk_index
              call blacs_abort(icontext, 1)
           endif
        endif
 
        mb = mblock(blk_index)
        nb = nblock(blk_index)
 
        irows = numroc( m, mb, myrow, rsrc, nprow )
        icols = numroc( n, nb, mycol, csrc, npcol )
 
        allocate(array(max(irows,1),max(icols,1)), stat=istat)
        if ( istat .ne. 0) then
           write(*,*) 'allocate failed in initialize_array'
           call blacs_abort(icontext,1)
        endif
 
!
!   Calculate the number of row and column blocks.
!
        num_mb = ( (irows + mb -1)/ mb )
        num_nb = ( (icols + nb -1)/ nb )
 
        allocate (index_array)
        index_array%num_row_blks = num_mb
        index_array%num_col_blks = num_nb
 
        allocate(index_array%srb(num_mb))
        allocate(index_array%erb(num_mb))
        allocate(index_array%scb(num_nb))
        allocate(index_array%ecb(num_nb))
 
        desc_array(DTYPE_) = 1
        desc_array(M_) = m
        desc_array(N_) = n
        desc_array(MB_) = mb
        desc_array(NB_) = nb
        desc_array(RSRC_) = rsrc
        desc_array(CSRC_) = csrc
        desc_array(CTXT_) = icontext
        desc_array(LLD_) = max(irows,1)
 
!
        start_row_block = mod( nprow + myrow - rsrc , nprow )
        start_col_block = mod( npcol + mycol - csrc , npcol )
 
        do i = 1, index_array%num_row_blks
          index_array%srb(i) = (start_row_block + (i - 1)*nprow) *             &
     &                       mb+1
          index_array%erb(i) = index_array%srb(i) + mb - 1
        enddo
        index_array%erb(num_mb) = mod(irows-1,mb) +                            &
     &                              index_array%srb(num_mb)
 
        do i = 1, index_array%num_col_blks
          index_array%scb(i) = (start_col_block + (i - 1)*npcol) *             &
     &                          nb + 1
          index_array%ecb(i) = index_array%scb(i) + nb - 1
        enddo
        index_array%ecb(num_nb) = mod(icols-1,nb) +                            &
     &                              index_array%scb(num_nb)
 
 
        end subroutine initialize_rarray
 
! Complex array initialization.
!
!****************************************************************************!
!*                                                                          *!
!*   Module routine initialize_carray                                       *!
!*                                                                          *!
!*   Purpose: Allocate space for a complex array and create associated      *!
!*            descriptor array and index array                              *!
!*                                                                          *!
!****************************************************************************!
        subroutine initialize_carray( array, desc_array,                       &
     &                               index_array, m, n, blk_index)
!
!    Arguments:
!       array: pointer to complex (out), pointer to real array, initialized.
!       desc_array: integer*4 (out), empty descriptor array, initialized.
!       index_array: g_index (out), pointer to g_index structure,
!                   see parameter.f, initialized.
!       m: integer*4 (out), scalar number of rows in global array.
!       n: integer*4 (out), scalar number of columns in global array.
!       blk_index: integer*4 (in), index into array of block sizes to use
!                   for initializing the descriptor array.
!
        integer desc_array(DESC_DIM), m, n, blk_index
        type (g_index),pointer :: index_array
        complex(long), pointer :: array(:,:)
 
 
! Local variables
        integer :: irows, icols, istat, i, j
        integer :: start_row_block, start_col_block
        integer :: mb, nb, num_mb, num_nb
 
!
!   Check to see if the block sizes were already calculated.
!
        if ( blk_index .lt.1 .or. blk_index .gt. sc_indx ) then
           if( iam .eq. 0 ) then
              write(*,*) 'No initialization done for index ', blk_index
              call blacs_abort(icontext, 1)
           endif
        endif
 
        mb = mblock(blk_index)
        nb = nblock(blk_index)
 
        irows = numroc( m, mb, myrow, rsrc, nprow )
        icols = numroc( n, nb, mycol, csrc, npcol )
 
        allocate(array(max(irows,1),max(icols,1)), stat=istat)
        if ( istat .ne. 0) then
           write(*,*) 'allocate failed in initialize_array'
           call blacs_abort(icontext,1)
        endif
 
 
!
!   Calculate the number of row and column blocks.
!
        num_mb = ( (irows + mb -1)/ mb )
        num_nb = ( (icols + nb -1)/ nb )
 
        allocate(index_array, stat=istat)
        if ( istat .ne. 0) then
           write(*,*) 'allocate failed in initialize_array'
           call blacs_abort(icontext,1)
        endif
 
        index_array%num_row_blks = num_mb
        index_array%num_col_blks = num_nb
 
        allocate(index_array%srb(num_mb))
        allocate(index_array%erb(num_mb))
        allocate(index_array%scb(num_nb))
        allocate(index_array%ecb(num_nb))
 
 
 
        desc_array(DTYPE_) = 1
        desc_array(M_) = m
        desc_array(N_) = n
        desc_array(MB_) = mb
        desc_array(NB_) = nb
        desc_array(RSRC_) = rsrc
        desc_array(CSRC_) = csrc
        desc_array(CTXT_) = icontext
        desc_array(LLD_) = max(irows,1)
 
 
!
        start_row_block = mod( nprow + myrow - rsrc , nprow )
        start_col_block = mod( npcol + mycol - csrc , npcol )
 
 
        do i = 1, index_array%num_row_blks
          index_array%srb(i) = (start_row_block + (i - 1)*nprow) *             &
     &                       mb+1
          index_array%erb(i) = index_array%srb(i) + mb - 1
        enddo
        index_array%erb(num_mb) = mod(irows-1,mb) +                            &
     &                              index_array%srb(num_mb)
 
        do i = 1, index_array%num_col_blks
          index_array%scb(i) = (start_col_block + (i - 1)*npcol) *             &
     &                          nb + 1
          index_array%ecb(i) = index_array%scb(i) + nb - 1
        enddo
        index_array%ecb(num_nb) = mod(icols-1,nb) +                            &
     &                              index_array%scb(num_nb)
 
        end subroutine initialize_carray
 
 
!
!****************************************************************************!
!*                                                                          *!
!*   Module routine clocal_to_rglobal                                       *!
!*                                                                          *!
!*   Purpose: Take the real parts of the local portions of a complex matrix *!
!*            and distribute them globally to each node                     *!
!*                                                                          *!
!****************************************************************************!
        subroutine clocal_to_rglobal(a,a_d,a_glob)
!
!    Arguments:
!     a: complex*16, dimension(:,:), is the local part of a
!                                    global complex array.
!     a_d: integer*4, array descriptor for a.
!     a_glob: real*8, dimension(:,:), entire matrix A on each node.
!
 
        complex(long), intent(in) :: a(:,:)
        integer, intent(in) :: a_d(DESC_DIM)
        real(long), intent(out) :: a_glob(:,:)
!
!    Local variables.
!
        integer :: nrow_blks, ncol_blks, ib, jb, ibl, jbl, i, j
        integer :: m, n, nb, mb, ig, jg, lda, il, jl, prow, pcol
        integer :: iarow, iacol, ni, nj
!
!    m is number of rows in global matrix.
!    n is number of columns in global matrix.
!    mb and nb are rows and columns in each block.
!    prow and  pcol are the processor row and column containing a block.
!    ib, jb, ibl, jbl are global and local  block indices.
!    il, jl, ig, jg are local and global matrix indices.
!
 
!
!
!     Start of executable code.
!
!====================================================================
!
 
! Determine the total number of row and column blocks.
        m = a_d(M_)
        n = a_d(N_)
        mb = a_d(MB_)
        nb = a_d(NB_)
        iarow = a_d(RSRC_)
        iacol = a_d(CSRC_)
        nrow_blks =  ( m + mb -1 )/ mb
        ncol_blks = ( n + nb - 1 )/ nb
!
! Determine leading dimension of global array.
        lda = size(a_glob, dim=1)
 
!
!  Loop over all of the blocks.
!
        do jb = 0, ncol_blks - 1
!
!  Loop over column blocks, determining both local and global j indices.
          jg = jb * nb + 1
          nj = nb
          if (jb .eq. ncol_blks - 1) nj = mod( n - 1, nb) + 1
          jbl = ( jb + iacol ) / npcol  - (mycol + iacol) / npcol
          jl = jbl * nb + 1
          pcol = mod((jb + iacol), npcol )
 
          do ib = 0, nrow_blks - 1
!
!  Loop over row blocks, determining both local and global i indices.
            ig = ib * mb + 1
            ni = mb
            if (ib .eq. nrow_blks - 1) ni = mod( m - 1, mb) + 1
            ibl = ( ib + iarow ) / nprow - (myrow + iarow) /nprow
            il = ibl * mb  + 1
            prow = mod((ib + iarow), nprow )
!
!   Determine if this block is on my processor.
!
            if( prow .eq. myrow .and. pcol .eq. mycol) then
!
!   Block is on my processor.
!
!   using Fortran 90 array language this is
!       a_glob(ig:ig+ni-1,jg:jg+nj-1) = a(il:il+ni-1:jl+nj-1)
!
              do j = 1, nj
                 do i  = 1, ni
                    a_glob( ig+i-1, jg+j-1 ) = a( il+i-1, jl+j-1)
                 enddo
              enddo
              call dgebs2d(icontext,'ALL',' ',ni,nj,a_glob(ig,jg),lda)
            else
!
!   Block is on somebody elses processor.
!
              call dgebr2d(icontext,'ALL',' ',ni,nj,a_glob(ig,jg),             &
     &                     lda, prow, pcol)
            endif
 
          enddo
        enddo
 
        return
        end subroutine clocal_to_rglobal
 
 
!****************************************************************************!
!*                                                                          *!
!*   Module routine rlocal_to_rglobal                                       *!
!*                                                                          *!
!*   Purpose: Take the local portions of a real matrix                      *!
!*            and distribute them globally to each node                     *!
!*                                                                          *!
!****************************************************************************!
        subroutine rlocal_to_rglobal(a,a_d,a_glob)
!
!    Arguments:
!     a: real*8, dimension(:,:), is the local part of a global real array.
!     a_d: integer*4, array descriptor for a.
!     a_glob: real*8, dimension(:,:), entire matrix A on each node.
!
!   Input arguments.
!
        real(long), intent(in) :: a(:,:)
        integer, intent(in) :: a_d(DESC_DIM)
        real(long), intent(out) :: a_glob(:,:)
 
!
!    Local variables.
!
        integer :: nrow_blks, ncol_blks, ib, jb, ibl, jbl, i, j
        integer :: m, n, nb, mb, ig, jg, lda, il, jl, prow, pcol
        integer :: iarow, iacol, ni, nj
!
!    m is number of rows in global matrix.
!    n is number of columns in global matrix.
!    mb and nb are rows and columns in each block.
!    prow and  pcol are the processor row and column containing a block.
!    ib, jb, ibl, jbl are global and local  block indices.
!    il, jl, ig, jg are local and global matrix indices.
!
 
!
!
!     Start of executable code.
!
!====================================================================
!
 
! Determine the total number of row and column blocks.
        m = a_d(M_)
        n = a_d(N_)
        mb = a_d(MB_)
        nb = a_d(NB_)
        iarow = a_d(RSRC_)
        iacol = a_d(CSRC_)
        nrow_blks =  ( m + mb -1 )/ mb
        ncol_blks = ( n + nb - 1 )/ nb
!
! Determine leading dimension of global array.
        lda = size(a_glob, dim=1)
 
!
!  Loop over all of the blocks.
!
        do jb = 0, ncol_blks - 1
!
!  Loop over column blocks, determining both local and global j indices
          jg = jb * nb + 1
          nj = nb
          if (jb .eq. ncol_blks - 1) nj = mod( n - 1, nb) + 1
          jbl = ( jb + iacol ) / npcol  - (mycol + iacol) / npcol
          jl = jbl * nb + 1
          pcol = mod((jb + iacol), npcol )
 
          do ib = 0, nrow_blks - 1
!
!  Loop over row blocks, determining both local and global i indices.
            ig = ib * mb + 1
            ni = mb
            if (ib .eq. nrow_blks - 1) ni = mod( m - 1, mb) + 1
            ibl = ( ib + iarow ) / nprow - (myrow + iarow) /nprow
            il = ibl * mb  + 1
            prow = mod((ib + iarow), nprow )
!
!   Determine if this block is on my processor.
!
            if( prow .eq. myrow .and. pcol .eq. mycol) then
!
!   Block is on my processor.
!
              do j = 1, nj
                 do i  = 1, ni
                    a_glob( ig+i-1, jg+j-1 ) = a( il+i-1, jl+j-1)
                 enddo
              enddo
              call dgebs2d(icontext,'ALL',' ',ni,nj,a_glob(ig,jg),lda)
            else
!
!   Block is on somebody elses processor.
!
              call dgebr2d(icontext,'ALL',' ',ni,nj,a_glob(ig,jg),             &
     &                     lda, prow, pcol)
            endif
 
          enddo
        enddo
 
        return
        end subroutine rlocal_to_rglobal
 
      end module scale

Input Data

 &input
 ly_ratio = 1.0, delta_t = .1, nx =15, ny=15, numx = 5, numy =5,
 numt=20, init_f=1, diff_f =3
 /

Output Data

  0:    point #      X         Y
  0:       1       0.5236     0.5236
  0:       2       1.0472     0.5236
  0:       3       1.5708     0.5236
  0:       4       2.0944     0.5236
  0:       5       2.6180     0.5236
  0:       6       0.5236     1.0472
  0:       7       1.0472     1.0472
  0:       8       1.5708     1.0472
  0:       9       2.0944     1.0472
  0:      10       2.6180     1.0472
  0:      11       0.5236     1.5708
  0:      12       1.0472     1.5708
  0:      13       1.5708     1.5708
  0:      14       2.0944     1.5708
  0:      15       2.6180     1.5708
  0:      16       0.5236     2.0944
  0:      17       1.0472     2.0944
  0:      18       1.5708     2.0944
  0:      19       2.0944     2.0944
  0:      20       2.6180     2.0944
  0:      21       0.5236     2.6180
  0:      22       1.0472     2.6180
  0:      23       1.5708     2.6180
  0:      24       2.0944     2.6180
  0:      25       2.6180     2.6180
  0:
  0:
  0:                              Points
  0:   TIME        #   1     #   2     #   3     #   4     #   5     #   6
  0:   0.10000   1.62046   2.69937   3.06225   2.69937   1.62046   2.58264
  0:   0.20000   1.41067   2.41366   2.76175   2.41366   1.41067   2.24055
  0:   0.30000   1.23798   2.15271   2.48036   2.15271   1.23798   1.95718
  0:   0.40000   1.09075   1.91572   2.21798   1.91572   1.09075   1.71506
  0:   0.50000   0.96250   1.70103   1.97560   1.70103   0.96250   1.50484
  0:   0.60000   0.84945   1.50710   1.75384   1.50710   0.84945   1.32086
  0:   0.70000   0.74925   1.33257   1.55266   1.33257   0.74925   1.15925
  0:   0.80000   0.66027   1.17611   1.37141   1.17611   0.66027   1.01707
  0:   0.90000   0.58127   1.03638   1.20907   1.03638   0.58127   0.89197
  0:   1.00000   0.51121   0.91203   1.06432   0.91203   0.51121   0.78192
  0:   1.10000   0.44919   0.80169   0.93574   0.80169   0.44919   0.68518
  0:   1.20000   0.39438   0.70404   0.82187   0.70404   0.39438   0.60020
  0:   1.30000   0.34601   0.61781   0.72126   0.61781   0.34601   0.52560
  0:   1.40000   0.30340   0.54179   0.63255   0.54179   0.30340   0.46016
  0:   1.50000   0.26591   0.47487   0.55444   0.47487   0.26591   0.40277
  0:   1.60000   0.23295   0.41604   0.48576   0.41604   0.23295   0.35247
  0:   1.70000   0.20401   0.36436   0.42543   0.36436   0.20401   0.30841
  0:   1.80000   0.17861   0.31901   0.37249   0.31901   0.17861   0.26982
  0:   1.90000   0.15634   0.27924   0.32605   0.27924   0.15634   0.23604
  0:   2.00000   0.13682   0.24438   0.28535   0.24438   0.13682   0.20647
  0:
  0:                              Points
  0:   TIME        #   7     #   8     #   9     #  10     #  11     #  12
  0:   0.10000   4.31793   4.90301   4.31793   2.58264   2.85249   4.78806
  0:   0.20000   3.84769   4.40918   3.84769   2.24055   2.43863   4.20461
  0:   0.30000   3.41303   3.93800   3.41303   1.95718   2.10266   3.67816
  0:   0.40000   3.01815   3.49799   3.01815   1.71506   1.82146   3.21228
  0:   0.50000   2.66276   3.09466   2.66276   1.50484   1.58234   2.80362
  0:   0.60000   2.34497   2.72994   2.34497   1.32086   1.37711   2.44657
  0:   0.70000   2.06219   2.40319   2.06219   1.15925   1.19994   2.13515
  0:   0.80000   1.81149   2.11234   1.81149   1.01707   1.04644   1.86371
  0:   0.90000   1.58985   1.85460   1.58985   0.89197   0.91312   1.62712
  0:   1.00000   1.39435   1.62693   1.39435   0.78192   0.79712   1.42087
  0:   1.10000   1.22220   1.42626   1.22220   0.68518   0.69609   1.24100
  0:   1.20000   1.07081   1.24971   1.07081   0.60020   0.60802   1.08410
  0:   1.30000   0.93783   1.09457   0.93783   0.52560   0.53119   0.94718
  0:   1.40000   0.82111   0.95838   0.82111   0.46016   0.46415   0.82766
  0:   1.50000   0.71874   0.83892   0.71874   0.40277   0.40561   0.72330
  0:   1.60000   0.62901   0.73420   0.62901   0.35247   0.35450   0.63215
  0:   1.70000   0.55039   0.64244   0.55039   0.30841   0.30985   0.55254
  0:   1.80000   0.48154   0.56208   0.48154   0.26982   0.27084   0.48298
  0:   1.90000   0.42125   0.49171   0.42125   0.23604   0.23676   0.42220
  0:   2.00000   0.36848   0.43011   0.36848   0.20647   0.20697   0.36908
  0:
  0:                              Points
  0:   TIME        #  13     #  14     #  15     #  16     #  17     #  18
  0:   0.10000   5.44297   4.78806   2.85249   2.45333   4.13480   4.70628
  0:   0.20000   4.82635   4.20461   2.43863   2.04155   3.53366   4.06306
  0:   0.30000   4.25037   3.67816   2.10266   1.72459   3.02543   3.50098
  0:   0.40000   3.72711   3.21228   1.82146   1.47101   2.59910   3.01855
  0:   0.50000   3.26069   2.80362   1.58234   1.26277   2.23988   2.60657
  0:   0.60000   2.84936   2.44657   1.37711   1.08877   1.93538   2.25472
  0:   0.70000   2.48867   2.13515   1.19994   0.94166   1.67587   1.95359
  0:   0.80000   2.17330   1.86371   1.04644   0.81630   1.45370   1.69519
  0:   0.90000   1.89793   1.62712   0.91312   0.70885   1.26279   1.47284
  0:   1.00000   1.65761   1.42087   0.79712   0.61636   1.09824   1.28104
  0:   1.10000   1.44792   1.24100   0.69609   0.53650   0.95604   1.11524
  0:   1.20000   1.26492   1.08410   0.60802   0.46739   0.83291   0.97163
  0:   1.30000   1.10520   0.94718   0.53119   0.40745   0.72611   0.84705
  0:   1.40000   0.96575   0.82766   0.46415   0.35539   0.63334   0.73883
  0:   1.50000   0.84399   0.72330   0.40561   0.31012   0.55266   0.64471
  0:   1.60000   0.73764   0.63215   0.35450   0.27071   0.48243   0.56279
  0:   1.70000   0.64474   0.55254   0.30985   0.23638   0.42125   0.49142
  0:   1.80000   0.56358   0.48298   0.27084   0.20646   0.36792   0.42920
  0:   1.90000   0.49265   0.42220   0.23676   0.18036   0.32140   0.37493
  0:   2.00000   0.43067   0.36908   0.20697   0.15758   0.28081   0.32758
  0:
  0:                              Points
  0:   TIME        #  19     #  20     #  21     #  22     #  23     #  24
  0:   0.10000   4.13480   2.45333   1.42982   2.41948   2.75758   2.41948
  0:   0.20000   3.53366   2.04155   1.14813   1.99356   2.29549   1.99356
  0:   0.30000   3.02543   1.72459   0.95015   1.67033   1.93488   1.67033
  0:   0.40000   2.59910   1.47101   0.79960   1.41458   1.64393   1.41458
  0:   0.50000   2.23988   1.26277   0.67989   1.20679   1.40486   1.20679
  0:   0.60000   1.93538   1.08877   0.58206   1.03495   1.20593   1.03495
  0:   0.70000   1.67587   0.94166   0.50068   0.89108   1.03879   0.89108
  0:   0.80000   1.45370   0.81630   0.43217   0.76952   0.89732   0.76952
  0:   0.90000   1.26279   0.70885   0.37401   0.66612   0.77685   0.66612
  0:   1.00000   1.09824   0.61636   0.32432   0.57769   0.67376   0.57769
  0:   1.10000   0.95604   0.53650   0.28168   0.50176   0.58522   0.50176
  0:   1.20000   0.83291   0.46739   0.24495   0.43633   0.50892   0.43633
  0:   1.30000   0.72611   0.40745   0.21322   0.37981   0.44300   0.37981
  0:   1.40000   0.63334   0.35539   0.18576   0.33088   0.38592   0.33088
  0:   1.50000   0.55266   0.31012   0.16193   0.28844   0.33642   0.28844
  0:   1.60000   0.48243   0.27071   0.14124   0.25158   0.29343   0.25158
  0:   1.70000   0.42125   0.23638   0.12325   0.21953   0.25604   0.21953
  0:   1.80000   0.36792   0.20646   0.10759   0.19163   0.22350   0.19163
  0:   1.90000   0.32140   0.18036   0.09395   0.16733   0.19516   0.16733
  0:   2.00000   0.28081   0.15758   0.08205   0.14614   0.17045   0.14614
  0:
  0:                              Points
  0:   TIME        #  25     #  26     #  27     #  28     #  29     #  30
  0:   0.10000   1.42982
  0:   0.20000   1.14813
  0:   0.30000   0.95015
  0:   0.40000   0.79960
  0:   0.50000   0.67989
  0:   0.60000   0.58206
  0:   0.70000   0.50068
  0:   0.80000   0.43217
  0:   0.90000   0.37401
  0:   1.00000   0.32432
  0:   1.10000   0.28168
  0:   1.20000   0.24495
  0:   1.30000   0.21322
  0:   1.40000   0.18576
  0:   1.50000   0.16193
  0:   1.60000   0.14124
  0:   1.70000   0.12325
  0:   1.80000   0.10759
  0:   1.90000   0.09395
  0:   2.00000   0.08205

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]