[comp.unix.cray] Microtasking in CAL

marcoz@enquirer.scandal.cs.cmu.edu (Marco Zagha) (02/23/91)

I have a few questions about how multiprocessing works in CAL.  My
main concern is figuring out the semantics of the $MDO/$ENDMDO
microtasked loops.  (I am using an 8-processor Y-MP.)

$MDO is a construct to start to run multiple iterations of a loop
in parallel.   The construct

   $MDO    S1=0,S6,TRIPCNT=S7
      [loop body]
   $ENDMDO

will execute a loop for S1 from 0 to S6 where iterations of the
loop may be run in parallel.

I have a few questions about how this works:

1) Which registers are "cloned" from the single-threaded code (before
the $MDO) to the multi-threaded code?  From the examples in the macros
manual (Macros and Opdefs Reference Manual SR-0012D), it appears that
at least the S and A registers get cloned.  Are the V, T, and B
registers also cloned?

2) What happens if you side-effect a register on an iteration of a
loop and use that register on a later iteration?  Are you always
guaranteed to get the values from the single-threaded code, or do you
get whatever was left behind from the most recent iteration executed
by that processor.  For example:

        S3 = 0
        $MDO S1=0,100
           S3 = S3 + 1
        $ENDMDO

On some iteration, say S1=50, do you get 

        a) S3 = 0
        b) S3 = some number between 0 and 50
        c) S3 = some number between 0 and 100
        d) S3 = garbage

From my experiments, it seems that (b) is correct and that the
registers are not re-cloned from the single-threaded code ---
side-effects can be seen in later iterations that use the same
processor.  Is the answer the same for all types of registers?
Unfortunately, the Cray Y-MP and Cray X-MP Multitasking Programmer's
Manual SR-0222 mostly describes Fortran and doesn't address my
questions about registers.  Does anyone know of any documentation or
sample code that I might find helpful?

In case you want to see a full example of $MDO, I've included one from
the macros manual at the end of my message.  (It this example is clear
that S5 from the single-threaded code is available in all the parallel
loop iterations, but I can't get any more information out of it than
that.)

I also have a question about allocating processors from C.  In
Fortran, the line "CMIC$ GETCPUS n" will ask for n processors.  How
can the equivalent be done in C?  (I've been calling my C from Fortran
to get around this problem.)

Thanks,

== Marco Zagha
School of Computer Science
Carnegie Mellon University
Internet: marcoz@cs.cmu.edu            Uucp:   ...!seismo!cs.cmu.edu!marcoz  
Bitnet:   marcoz%cs.cmu.edu@cmuccvma   CSnet:  marcoz%cs.cmu.edu@relay.cs.net


  The following example adds two 2-dimensional arrays, element by element,
  and places the output in a third array.  The addition is vectorized on
  the inner loop and microtasked on the outer loop.  This example also
  shows the nesting of a $VDO/$ENDVDO macro pair inside a scalar
  multitasked macro.

     ____________________________________________________
    |Location|Result_____|Operand________|Comment________
    |1_______|10_________|20_____________|35_____________
    |        |           |               |
    |        |S6         |  D'20         |Set ending index for outer loop
    |        |S5         |  D'300        |Set ending index for inner loop
    |        |$MDO       |  S1=0,S6,TRIPCNT=S7
    |        |  A2       |  S1           |Move index to A register
    |        |  A3       |  D'500        |Get first dimension of arrays
    |        |  A2       |  A2*A3        |Computer offset into arrays
    |        |  A3       |  X            |Get base address of X array
    |        |  A3       |  A3+A2        |Compute staring offset into X
    |        |  A4       |  Y            |Get base address of Y array
    |        |  A4       |  A4+A2        |Compute staring offset into Y
    |        |  A5       |  Z            |Get base address of Z array
    |        |  A5       |  A5+A2        |Compute staring offset into Z
    |        |  $VDO     |  S2=0,S5,TRIPCNT=S3,SEGLEN=A1
    |        |    A0     |    A3         |
    |        |    V0     |    ,A0,1      |Load segment of X array
    |        |    A0     |    A4         |
    |        |    V1     |    ,A0,1      |Load segment of Y array
    |        |    V2     |    V0+FV1     |Add segments of X and Y arrays
    |        |    A0     |    A5         |
    |        |    ,A0,1  |    V2         |Store sum in Z array
    |        |    A3     |    A3+A1      |Increment pointer into X array
    |        |    A4     |    A4+A1      |Increment pointer into Y array
    |        |    A5     |    A5+A1      |Increment pointer into Z array
    |        |  $ENDVDO  |               |
    |        |$ENDMDO    |               |