[comp.arch] Myrias Programming Model

dcw@myrias.com (Dan Wilson) (12/19/89)

Herein lies a short description of the programming model for Myrias
computers. 

The parallel extension which we add to FORTRAN and C is a PARDO.  In FORTRAN,
the syntax of PARDO is identical to that of a DO loop.  a PARDO may occur
anywhere a DO may occur.  C is a little trickier since there is no real do
loop in C.  Consequently, we add a PARDO looping construct.  There are no
other substantive additions to either language (and no operating system calls
either).

Conceptually, each 'iteration' of a PARDO is a separate task complete with
its own memory image.  These tasks may be executed in parallel, or in any
order.  When a 'parent' task executes a PARDO instruction, a collection of
'child' tasks are created and the parent task is suspended until all of the
child tasks complete.  Child tasks inherit identical copies of the parent
task's memory with only the iterator differing.  Each task then executes
independently.  Since each child task executes within its own memory image,
sibling tasks cannot affect each other's memory.  When all of the child tasks
complete, their memory images are merged together to form a new memory image
for the parent.  The merging rules are such that if exactly one child task
changes a location, then the new value appears in the parent's memory image.
When merging completes, the parent task resumes execution.  The actual
implementation has more overlap and less copying than might be thought.

The assignment of tasks to processors is hidden from the user.  The operating
system automatically and dynamically assigns or reassigns tasks to processors
to keep all the processors busy and the load evenly balanced.  The operating
system also manages all of the copying, storage, and merging of memory
images.

The concept of PARDO is pretty much orthogonal to the other constructs in
the programming languages.  So the following code structure is legal in
FORTRAN:
	DIMENSION A(1000,100,100)
	DO 10 I = 2,100
	  PARDO 10 J = 2, I
	    DO 10 K = 2, 100
	      LIMIT = F(I, J, K)
	      PARDO 10 L = 2, LIMIT
		A(L,K,J) = FUNC(A(L,K,J),A(L-1,K-1,J-1))
    10 CONTINUE
A few observations about this example:
1) the number of tasks to execute is not statically determinable, only
   the runtime evaluation of F will determine the number of tasks.  The
   load is dynamically distributed (and redistributed) on however many
   processor are being used to solve the problem.

2) PARDOs can nest.  Arbitrary functions may be called from inside of
   PARDOs; either of the functions F or FUNC may also invoke PARDOs.  We
   also allow recursion in FORTRAN so functions may invoke themselves
   from within a PARDO.

3) Tasks may freely access and update any portion of their memory
   image.  A task may safely read a location that is "modified" by one of
   its siblings.  It will not read the sibling's modifications, but rather
   read what was in that location in the parent memory image.

4) No, every task does not get a copy of the 40 Mbyte array A.  We currently
   do demand paging (amongst processors) with copy-on-write.

Perhaps I glossed over the merging rules too quickly above.
    When all children of a task have completed, its memory can receive
    new values, based on changes the children have made in their memories.
    The value of a particular location in the parent is determined by the
    merging rule:
    - if no child modified the location's value, it remains unchanged
    - if one child modified the location's value, the value the child
      put there replaces the old parent value
    - if several children modified the location's value, but all of them
      placed the same value there, that value replaces the old parent
      value
    - if several children modified the location's value, and two or more
      of them placed different values there, the new parent value is
      unpredictable

People often get hung-up on the 'unpredictable' result.  Two observations:
1) creation of an unpredictable result is not inherently bad; only the use
   of an unpredictable result is bad,
2) our user level debugger is capable of trapping the use of any
   'undefined' storage, which includes both uninitialised memory and memory
   that has become unpredictable due to multiple merging.
-- 
Dan Wilson                                                     dcw@myrias.com
Myrias Research Corporation                        {uunet,alberta}!myrias!dcw
Edmonton, Alberta, Canada                                     +1 403 428 1616