xxrich@alliant1 (Rich Rinehart) (03/27/90)
I would like to force some MIMD like operation on the following loop but am having troubles figuring out the synchronization. Can anyone be of help? The problem is something like this.... cvd$ concur do 100 I=1,100 call sub1 call sub2 c do sub3 only after sub1 and 2 have completed call sub3 100 contiue Do I need to use the lock system call or is there another way? Anyone have any examples? It's been a while since the optimization class... -- ----------------------------------------------------------------------------- Rich Rinehart | phone: 216-433-5211 NASA Lewis Research Center | email: xxrich@alliant1.lerc.nasa.gov -----------------------------------------------------------------------------
muller@Alliant.COM (Jim Muller) (03/28/90)
In article <1990Mar27.132340.4243@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes: >I would like to force some MIMD like operation on the following >loop but am having troubles figuring out the synchronization. >Can anyone be of help? The problem is something like this.... > cvd$ concur > do 100 I=1,100 > call sub1 > call sub2 > c do sub3 only after sub1 and 2 have completed > call sub3 > 100 continue >Do I need to use the lock system call or is there another way? Anyone >have any examples? It's been a while since the optimization class... (I'll try to answer this, but I'm not sure you meant it exactly as written.) As written, every iteration in the concurrent 'do 100' loop will be assigned its own processor, and each will execute its own copy of sub1 then sub2 then sub3, in order. So as it stands, it will do exactly what you asked for, on a per-processor basis until 100 copies have been run, without using any lock or unlock calls. The only requirment is that all subroutines called concurrently (and all that they also call, etc.) must be compiled 'recursive', via either the '-recursive' switch from the fortran command or with the subroutine header 'recursive subroutine sub1'. By the way, your directive 'cvd$ concur' should be 'cvd$ cncall' to tell the compiler that concurrent calls are okay in this loop. Concurrency must be invoked from the fortran command; by itself, the directive 'cvd$ concur' simply turns concurrency on if it was invoked from the fortran command but turned off earlier in the file with a similar directive. If the fortran command specifies concurrency optimization, the default is for all loops to be analyzed for concurrency unless a directive turns it off. [Note: Be careful about what the subroutines do in common blocks, since common blocks are not replicated by 'recursive'. You can use the -interface mechanism to let the compiler determine if those subroutines are "cncall-able" but it will not examine common blocks either. Anyway, you might get better performance by inlining those subroutines from the fortran command with -inline sub1 -inline sub2 -inline sub3.] ---------- However, I suspect perhaps you meant something different. One possibility is to force *simultaneous execution* of sub1 and sub2, *then* do sub3: call sub1 ! do sub1 on one processor call sub2 ! do sub2 on a different processor Sub1 and sub2 will be running simultaneously on different processors. When both sub1 and sub2 are done, then do: call sub3 ! do sub3 only after sub1 and 2 have completed In this case, one way to do it would be: cvd$ cncall do 100 I=1,2 if (I.eq.1) call sub1 if (I.eq.2) call sub2 100 continue call sub3 The 'do 100' loop is a trick to create a concurrent loop with each iteration doing something different. Again, no locks are required, but both sub1 and sub2 (and any child-subroutines) must be compiled 'recursive'. (The test for iteration number 'if (I.eq.n)' can be kept in this subroutine or moved into the called subroutines by passing the iteration number I as a parameter. If it is moved into the called subroutines, sub1 and sub2 could be merged.) ---------- Another possibility is that multiple copies of sub1 and sub2 are needed, and after all of them are done, call sub3. This is like your original code, but with 'call sub3' outside of the loop: cvd$ cncall do 100 I=1,100 call sub1 call sub2 100 continue call sub3 Again, no locks are needed, but 'recursive' is needed for sub1 and sub2. ---------- The only reason you might ever need lock/unlock is if you have a concurrent loop in which a portion of the code (which may contain a subroutine call) must be *forced* into one-copy-at-a-time execution. A typical situation would be if different iterations write to the same memory location or common block, or if a cncall'ed subroutine might possibly do concurrent writes (which causes a run-time error if it happens). In essence, you want each processor, as it enters that critical code, to lock out all the others if it "gets there first" or stall until the code is free if another processor has already locked it. The usual syntax to do this is to declare a variable to hold the lock state (typically as an integer though it will be used as a Boolean) and initialize it to 0. Then the statement 'call lock(l)' (where l is that variable) will do exactly what is wanted, locking out other processors until that lock is opened with 'call unlock(l)', or stalling this processor until some other processor opens it. When a lock is set, other processors are not affected until each encounters that lock too, at which point it will stall. If several processors are waiting for the same lock, only one will be allowed access when it opens, and of course, its first action will be to lock it up again. In this way the processors are allowed one-at-a-time entry to that code. The order of access between several stalled processors is not predictable (at least by a normal user). The state of the lock can be read, e.g. if(l.ne.0), but it must *not* be changed except with the lock/unlock calls, i.e. you must not do l = 0 to unlock the lock l. It is not necessary that the same processor that sets a lock be the one to unlock it, nor that it be locked when an unlock call is executed. However if a lock is set but never unlocked and the other processors encounter it, they will stall until the job is finished. Likewise, if a lock is set and no other processors unlock it, and the processor that set it encounters it *again* before unlocking it, it too will stall, so the job may never finish. Here is an example that uses lock to prevent concurrent writes from within a function that would otherwise run concurrently: program lockdemo real x(100) integer i, l c Initialize the lock l. l = 0 cvd$ cncall sub1 do i = 1,100 x(i) = sub1(i,l) enddo write(6,*) 'Done. x(100) =', x(100) stop end recursive function sub1(i,l) integer i, l real sub1 sub1 = .002 * float(i) - .003 * float(i*i) c Do more work here. c Set lock. Other processors cannot pass here until lock is opened. call lock(l) write(20,*) 'In sub1, completed:',i call unlock(l) return end Jim Muller -- - Jim Muller
xxrich@alliant1 (Rich Rinehart) (03/29/90)
>>I would like to force some MIMD like operation on the following >>loop but am having troubles figuring out the synchronization. >>Can anyone be of help? The problem is something like this.... > >> cvd$ concur >> do 100 I=1,100 >> call sub1 >> call sub2 >> c do sub3 only after sub1 and 2 have completed >> call sub3 >> 100 continue > > >As written, every iteration in the concurrent 'do 100' loop will be assigned >its own processor, and each will execute its own copy of sub1 then sub2 then >sub3, in order. So as it stands, it will do exactly what you asked for, on a This isn't what i wanted but i didn't realize this. It would complete 100 iterations of sub1, then go on to sub2? >However, I suspect perhaps you meant something different. One possibility is >to force *simultaneous execution* of sub1 and sub2, *then* do sub3: Yes, this is what i wanted >cvd$ cncall > do 100 I=1,2 > if (I.eq.1) call sub1 > if (I.eq.2) call sub2 >100 continue > call sub3 > >The 'do 100' loop is a trick to create a concurrent loop with each iteration >doing something different. Again, no locks are required, but both sub1 and >sub2 (and any child-subroutines) must be compiled 'recursive'. (The test Neat trick. How can i guarantee that sub1 and sub2 will act in a mimd fassion? What if sub1 call other routines which the compiler might find what it thinks to be a 'better' loop to parallelize? Why must it be compiled recursively for this? (or is this the answer). Thank you for your locking explaination as I plan to use it for another analysis. Thanks again for help! -rich -- ----------------------------------------------------------------------------- Rich Rinehart | phone: 216-433-5211 NASA Lewis Research Center | email: xxrich@alliant1.lerc.nasa.gov -----------------------------------------------------------------------------