prentice@triton.unm.edu (John Prentice) (03/29/91)
Consider the following loop: do 30 k=1,kmax do 20 j=1,jmax do 10 i=1,imax a(i,j,k)=... 10 continue 20 continue 30 continue On the Cray, only the inner most loop will vectorize. Does anyone have a suggestion for how to collapse this loop while still using the three dimensional array? In other words, I need a vectorized equivalent to: i=0 j=1 k=1 do 10 n=1,imax*jmax*kmax i=i+1 if (i.gt.imax) then i=1 j=j+1 if (j.gt.jmax) then j=1 k=k+1 endif endif a(i,j,k)=... 10 continue Any suggestions would be welcomed. Thanks. John -- John K. Prentice john@unmfys.unm.edu (Internet) Dept. of Physics and Astronomy, University of New Mexico, Albuquerque, NM, USA Computational Physics Group, Amparo Corporation, Albuquerque, NM, USA
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/30/91)
>>>>> On 29 Mar 91 14:13:13 GMT, prentice@triton.unm.edu (John Prentice) said:
John> Consider the following loop:
John> do 30 k=1,kmax
John> do 20 j=1,jmax
John> do 10 i=1,imax
John> a(i,j,k)=...
John> 10 continue
John> 20 continue
John> 30 continue
John> On the Cray, only the inner most loop will vectorize.
That is not strictly true. CFT77 will automatically collapse all
three loops if the arrays are all dimensioned (imax,jmax,*).
Furthermore, the parallelizer will strip-mine the collapsed version of
the loop in this case, which will make for the lowest possible
overhead....
John> Does anyone have a suggestion for how to collapse this loop
John> while still using the three dimensional array? [....]
No matter what you do you will need to know the leading dimensions of
the arrays. If (imax,jmax) are not the leading dimensions, then you
can vectorize over the whole array anyway and use a mask. Whether or
not this will run faster than the inner-loop-vectorized code depends
on too many factors to talk about in general, but you need to take
into acount the relative sizes of imax and IDIM (etc), the complexity
of the RHS of the assignment statement, the absolute sizes of imax,
jmax, IDIM, JDIM, and perhaps a few more things.....
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
paco@rice.edu (Paul Havlak) (03/30/91)
In article <1991Mar29.141313.7418@ariel.unm.edu>, prentice@triton.unm.edu (John Prentice) writes: |> Consider the following loop: |> |> do 30 k=1,kmax |> do 20 j=1,jmax |> do 10 i=1,imax |> a(i,j,k)=... |> 10 continue |> 20 continue |> 30 continue |> |> On the Cray, only the inner most loop will vectorize. |> ... If the Cray compiler really doesn't catch that case, you should complain loudly to Cray. Multi-dimensional vectorization is not much harder than single-dimensional (in this case, both are trivial). The PFC system at Rice does multi-dimensional vectorization, as does the commercially available KAP system from Kuck and Assoc. The failure of compilers to catch such simple cases drives programmers to try and trick the compiler (see the Perfect benchmark source for examples). Unfortunately, what tricks one compiler confuses others (even those that could have properly optimized the original code). So please, before uglifying your code for a compiler, complain! You may still have to rewrite the code, but they might eventually get the message. Good news about Fortran 90: You can write the above loop in triplet notation: a(1:imax,1:jmax,1:kmax) = ... Bad news about Fortran 90: If "a" is a formal parameter array (dummy arg), it might not be contiguous (assuming the implementation of array sections as parameters is not copy-in/copy-out). That inner loop will be hard to optimize if the stride between array elements is unknown. Interprocedural analysis will help, but will it be enough? Paul Havlak "I'd rather optimize Fortran than write it."
bernhold@red8 (David E. Bernholdt) (03/30/91)
In article <1991Mar29.141313.7418@ariel.unm.edu> prentice@triton.unm.edu (John Prentice) writes: >Consider the following loop: > > do 30 k=1,kmax > do 20 j=1,jmax > do 10 i=1,imax > a(i,j,k)=... > 10 continue > 20 continue > 30 continue Pretty obvious, and possibly not useful in this case, but... How about passing a in as a 1-d array, dimension kmax*jmax*imax? Problems: 1) if the physical dimension of the matrix is larger than the computational dimension. 2) if the rhs has explicit dependence on i, j, k rather than the meta-index. If you get any better responses, please cc to me or post 'em. -- David Bernholdt bernhold@qtp.ufl.edu Quantum Theory Project bernhold@ufpine.bitnet University of Florida Gainesville, FL 32611 904/392 6365
morreale@bierstadt.scd.ucar.edu (Peter Morreale) (03/30/91)
In article <1991Mar29.141313.7418@ariel.unm.edu>, prentice@triton.unm.edu (John Prentice) writes: > Consider the following loop: > > do 30 k=1,kmax > do 20 j=1,jmax > do 10 i=1,imax > a(i,j,k)=... > 10 continue > 20 continue > 30 continue > > On the Cray, only the inner most loop will vectorize. Does anyone > have a suggestion for how to collapse this loop while still using > the three dimensional array? In other words, I need a vectorized > equivalent to: > Whether or not you can collaspe the loop structure depends entirely on whether i, j, or k is used on the right hand side of the assignment. If the loop indices are not used, you could collaspe the loop as follows: DO 30 K = 1, IMAX*JMAX*KMAX A(K,1,1) = <whatever> 30 CONTINUE In any event, the Cray Fortran preprocessor (fpp) can help significantly in determining whether or not the loop can collaspe. (since it *will* collaspe the loop if it can) You can still trick fpp, but it's a terrific start. I'd suggest you run the routine through fpp and see what it produces; % fpp sub.f > sub.fpp -PWM ------------------------------------------------------------------ Peter W. Morreale email: morreale@ncar.ucar.edu Nat'l Center for Atmos Research voice: (303) 497-1293 Scientific Computing Division Consulting Office ------------------------------------------------------------------
bleikamp@convex.com (Richard Bleikamp) (03/30/91)
In article <1991Mar29.165126.11431@rice.edu> paco@rice.edu (Paul Havlak) writes: > >In article <1991Mar29.141313.7418@ariel.unm.edu>, prentice@triton.unm.edu (John Prentice) writes: >|> Consider the following loop: >|> >|> do 30 k=1,kmax >|> do 20 j=1,jmax >|> do 10 i=1,imax >|> a(i,j,k)=... >|> 10 continue >|> 20 continue >|> 30 continue >|> >|> On the Cray, only the inner most loop will vectorize. >|> ... > > useful discussion deleted. > >Good news about Fortran 90: You can write the above loop in triplet notation: > > a(1:imax,1:jmax,1:kmax) = ... > > more discussion deleted. > >Paul Havlak Note that substituting Fortran 90 array notation is ONLY valid when the assignment statement "a(i,j,k)= ..." does NOT contain a loop carried dependency. It can be non-trivial (in complicated loops) to decide if any dependencies exist. When they do, you can sometimes rewrite the loop nest as two or more separate Fortran 90 assignment statements, but sometimes there is no semantically equivalent array notation. -- ------------------------------------------------------------------------------ Rich Bleikamp bleikamp@convex.com Convex Computer Corporation
ftower@ncar.ucar.EDU (Francis Tower) (03/30/91)
Depending on what you're up to you could use: A(:,:,:) = ... or A = ... for example: A = 0.0 will zero the entire array (FORTRAN90 syntax)