marcoz@enquirer.scandal.cs.cmu.edu (Marco Zagha) (02/23/91)
I have a few questions about how multiprocessing works in CAL. My main concern is figuring out the semantics of the $MDO/$ENDMDO microtasked loops. (I am using an 8-processor Y-MP.) $MDO is a construct to start to run multiple iterations of a loop in parallel. The construct $MDO S1=0,S6,TRIPCNT=S7 [loop body] $ENDMDO will execute a loop for S1 from 0 to S6 where iterations of the loop may be run in parallel. I have a few questions about how this works: 1) Which registers are "cloned" from the single-threaded code (before the $MDO) to the multi-threaded code? From the examples in the macros manual (Macros and Opdefs Reference Manual SR-0012D), it appears that at least the S and A registers get cloned. Are the V, T, and B registers also cloned? 2) What happens if you side-effect a register on an iteration of a loop and use that register on a later iteration? Are you always guaranteed to get the values from the single-threaded code, or do you get whatever was left behind from the most recent iteration executed by that processor. For example: S3 = 0 $MDO S1=0,100 S3 = S3 + 1 $ENDMDO On some iteration, say S1=50, do you get a) S3 = 0 b) S3 = some number between 0 and 50 c) S3 = some number between 0 and 100 d) S3 = garbage From my experiments, it seems that (b) is correct and that the registers are not re-cloned from the single-threaded code --- side-effects can be seen in later iterations that use the same processor. Is the answer the same for all types of registers? Unfortunately, the Cray Y-MP and Cray X-MP Multitasking Programmer's Manual SR-0222 mostly describes Fortran and doesn't address my questions about registers. Does anyone know of any documentation or sample code that I might find helpful? In case you want to see a full example of $MDO, I've included one from the macros manual at the end of my message. (It this example is clear that S5 from the single-threaded code is available in all the parallel loop iterations, but I can't get any more information out of it than that.) I also have a question about allocating processors from C. In Fortran, the line "CMIC$ GETCPUS n" will ask for n processors. How can the equivalent be done in C? (I've been calling my C from Fortran to get around this problem.) Thanks, == Marco Zagha School of Computer Science Carnegie Mellon University Internet: marcoz@cs.cmu.edu Uucp: ...!seismo!cs.cmu.edu!marcoz Bitnet: marcoz%cs.cmu.edu@cmuccvma CSnet: marcoz%cs.cmu.edu@relay.cs.net The following example adds two 2-dimensional arrays, element by element, and places the output in a third array. The addition is vectorized on the inner loop and microtasked on the outer loop. This example also shows the nesting of a $VDO/$ENDVDO macro pair inside a scalar multitasked macro. ____________________________________________________ |Location|Result_____|Operand________|Comment________ |1_______|10_________|20_____________|35_____________ | | | | | |S6 | D'20 |Set ending index for outer loop | |S5 | D'300 |Set ending index for inner loop | |$MDO | S1=0,S6,TRIPCNT=S7 | | A2 | S1 |Move index to A register | | A3 | D'500 |Get first dimension of arrays | | A2 | A2*A3 |Computer offset into arrays | | A3 | X |Get base address of X array | | A3 | A3+A2 |Compute staring offset into X | | A4 | Y |Get base address of Y array | | A4 | A4+A2 |Compute staring offset into Y | | A5 | Z |Get base address of Z array | | A5 | A5+A2 |Compute staring offset into Z | | $VDO | S2=0,S5,TRIPCNT=S3,SEGLEN=A1 | | A0 | A3 | | | V0 | ,A0,1 |Load segment of X array | | A0 | A4 | | | V1 | ,A0,1 |Load segment of Y array | | V2 | V0+FV1 |Add segments of X and Y arrays | | A0 | A5 | | | ,A0,1 | V2 |Store sum in Z array | | A3 | A3+A1 |Increment pointer into X array | | A4 | A4+A1 |Increment pointer into Y array | | A5 | A5+A1 |Increment pointer into Z array | | $ENDVDO | | | |$ENDMDO | |