[comp.parallel] Dynamic parallelism

chkim%pollux.usc.edu@usc.edu (Chinhyun Kim) (11/13/89)

In numerical analysis, arrays being indexed by other arrays as shown below
are supposed to be very common in programs involving sparse matrices,
fluid flow simulations, etc.  If one wanted to parallelize this loop during
compilation, he/she would have to know the values of B(i) and C(i).  If they
are not known at compile time, it will be impossible to parallelize the
operations in the loop and thus must be executed in sequential manner.

There are, however, ways to parallelize the operations in the loop by checking
data dependency during run time and letting operations which are free of data
dependency to execute in parallel. This kind of parallelism which can be
determined only during run time is called, "dynamic parallelism".

A(i) = .....		; A(i)'s are initialized.
B(i) = ....		; B(i)'s are determined.
C(i) = ....		; C(i)'s are determined.

DO i = 1..N
    A(B(i)) = .....	        ; Array A is updated.
    T(i) = A(C(i)) + ..	        ; Array A is consumed for some operation.
ENDDO

I am currently working on ways to exploit dynamic parallelism as opposed to
static parallelism which the compiler assesses parallelism during compilation.
The model of computation is data-flow and the specific type of data-flow
computer model is tagged token data-flow computer.  I am using a mechanism
known as "token relabeling" to do this.

The results of my simulation so far is good.  However, I have some fundamental
questions to ask.

Q1. In exactly what type of numerical simulations do such loops appear and why
    are they necessary?

Q2. Are there other ways to do it without having the subscripts unknown at
    compile time, i.e. is exploitation of dynamic parallelism avoidable while
    still achieving parallel execution?

Q3. Is it possible for me to get the fragment of the simulation program which
    has such loop? i.e. assuming answer to the first part of question 1 is
    true.

Answers to these questions will be appreciated very much.

Chinhyun Kim
chkim@pollux.usc.edu

bjornl@tds.kth.se (Bj|rn Lisper) (11/15/89)

Factorizing sparse matrices gives a prime example of what you call dynamic
parallelism. When you don't know the structure of the matrix (i.e. where the
non-zeroes are), you can't assume independence between any rows and thus the
triangularization must proceed sequentially, with the only possible
parallelism being pipelining between the Gaussian elimination on subsequent
rows. When the structure is known, however, it is possible to detect
independence between rows: then, they can be operated on in parallel.

Note that there are counterparts to "dynamic parallelism" in sequential
processing as well. When factorizing sparse matrices sequentially, for
instance, it turns out to be advantageous to perform a so-called "symbolic
factorization" as soon as the structure is known. This amounts to predicting
the fill-in, so that vector entries can be reserved beforehand. Thus, the
need for handling dynamic memory structures during the factorization is
eliminated.

This is clearly related to partial evaluation. Partial evaluation means
using increased knowledge about indata to simplify functions, or program
code. "Ordinary" parallelism detection is then partial evaluation at
compile-time and dynamic parallelism is partial evaluation at run-time.

Bjorn Lisper