hemant@acsu.buffalo.edu (Hemant Dandekar) (04/20/91)
Hi, I am having trouble 'parallelizing' the outermost do 20 loop when I use the fpp preprocessor on a CRAY-2s running UNICOS 6.1 and cf77 4.0.3. This loop does parallelize when I use the parallel fortran under IBM FORTVS 2.4 and 2.5. I don't see an depedence in this loop at the k level. Here is the the routine: The subroutine calculates the convective term in a 3D cfd code. I have removed the comment statements for sake of saving bandwidth. subroutine convc1 (nlo,nhi,mlo,mhi,klo,khi, $ f,indgeo,isimpl,confac) parameter (nmax=65,mmax=45,kmax=45) dimension f(0:nmax,0:mmax,0:kmax), 1 co1(0:nmax,0:mmax,0:kmax), 1 co2(0:nmax,0:mmax,0:kmax), 1 co3(0:nmax,0:mmax,0:kmax), 1 co4(0:nmax,0:mmax,0:kmax) c common / admsh1 / x(0:nmax,0:mmax),y(0:nmax,0:mmax), 1 xne(0:nmax,0:mmax),yne(0:nmax,0:mmax) common / amsh1d / we(0:kmax),z(0:kmax),zne(0:kmax) common / conflu / cof(0:nmax,0:mmax,0:kmax) common / dotpro / dpl(0:nmax,0:mmax,0:kmax), 1 dpr(0:nmax,0:mmax,0:kmax), 1 dpb(0:nmax,0:mmax,0:kmax), 1 dpt(0:nmax,0:mmax,0:kmax) common / datpro / dw1(0:nmax,0:mmax,0:kmax), 1 dw2(0:nmax,0:mmax,0:kmax), 1 dw3(0:nmax,0:mmax,0:kmax), 1 dw4(0:nmax,0:mmax,0:kmax), 1 dpw1(0:nmax,0:mmax,0:kmax), 1 dpw2(0:nmax,0:mmax,0:kmax), 1 dpw3(0:nmax,0:mmax,0:kmax), 1 dpw4(0:nmax,0:mmax,0:kmax) do 10 k=klo,khi do 10 j=mlo,mhi do 10 i=nlo,nhi cof(i,j,k) = 0.0 10 continue if(indgeo.eq.0) then confac = 2.0*2.0*8.0 confc1 = confac * 0.5 else confac = 2.0*4.0*4.0*4.0*6.0*6.0 confc1 = confac * 0.5 end if do 20 k=klo,khi do 20 j=mlo,mhi-1 do 20 i=nlo,nhi-1 f1 = f(i,j,k) f2 = f(i+1,j+1,k) f3 = f(i,j+1,k) f4 = f(i+1,j,k) f5 = f(i,j,k+1) f6 = f(i+1,j+1,k+1) f7 = f(i,j+1,k+1) f8 = f(i+1,j,k+1) c zl = ( (z(k+1) - z(k-1)) + (zne(k+1) - zne(k-1))) / 4.0 c vnl = dpl(i,j,k) vnr = dpr(i,j,k) vnb = dpb(i,j,k) vnt = dpt(i,j,k) c dwp1 = dpw1(i,j,k) dwp2 = dpw2(i,j,k) dwp3 = dpw3(i,j,k) dwp4 = dpw4(i,j,k) c flb = f1 flt = f3 frb = f4 frt = f2 fbl = f1 fbr = f4 ftl = f3 ftr = f2 c f1b = f1 f1t = f5 f2b = f2 f2t = f6 f3b = f3 f3t = f7 f4b = f4 f4t = f8 c col = (vnl + abs(vnl))*flb + (vnl - abs(vnl))*flt cor = (vnr + abs(vnr))*frb + (vnr - abs(vnr))*frt cob = (vnb + abs(vnb))*fbl + (vnb - abs(vnb))*fbr cot = (vnt + abs(vnt))*ftl + (vnt - abs(vnt))*ftr c co1(i,j,k) = (dwp1 + abs(dwp1))*f1b + (dwp1 - abs(dwp1))*f1t co2(i,j,k) = (dwp2 + abs(dwp2))*f2b + (dwp2 - abs(dwp2))*f2t co3(i,j,k) = (dwp3 + abs(dwp3))*f3b + (dwp3 - abs(dwp3))*f3t co4(i,j,k) = (dwp4 + abs(dwp4))*f4b + (dwp4 - abs(dwp4))*f4t c cof(i,j,k) = cof(i,j,k) - (col + cob)*zl - $ co1(i,j,k)*confc1 cof(i+1,j+1,k) = cof(i+1,j+1,k) + (cor + cot)*zl - $ co2(i,j,k)*confc1 cof(i,j+1,k) = cof(i,j+1,k) + (col - cot)*zl - $ co3(i,j,k)*confc1 cof(i+1,j,k) = cof(i+1,j,k) - (cor - cob)*zl - $ co4(i,j,k)*confc1 20 continue return end Another section of the code which is exactly the same but with following lines for terms 'flb to ftr' does parallelize. flb = f4 + 3.0*f1 flt = f2 + 3.0*f3 frb = f1 + 3.0*f4 frt = f3 + 3.0*f2 fbl = f3 + 3.0*f1 fbr = f2 + 3.0*f4 ftl = f1 + 3.0*f3 ftr = f4 + 3.0*f2 Is it doing that because, it finds insufficient work to do in the previous case and more work to do in the second case (where there are 8 additional multiplications inside the loop)? Any comments/suggestions would be appreciated. thanks, hemant -- ---------------------------------------------------------------------------- Bitnet: v092qghg@ubvms.bitnet | Hemant W. Dandekar Internet: hemant@asterix.eng.buffalo.edu| 303, Furnas Hall, Chem. Eng. (716)636-2631 | SUNY at Buffalo, Buffalo NY 14260