chkim%pollux.usc.edu@usc.edu (Chinhyun Kim) (11/13/89)
In numerical analysis, arrays being indexed by other arrays as shown below are supposed to be very common in programs involving sparse matrices, fluid flow simulations, etc. If one wanted to parallelize this loop during compilation, he/she would have to know the values of B(i) and C(i). If they are not known at compile time, it will be impossible to parallelize the operations in the loop and thus must be executed in sequential manner. There are, however, ways to parallelize the operations in the loop by checking data dependency during run time and letting operations which are free of data dependency to execute in parallel. This kind of parallelism which can be determined only during run time is called, "dynamic parallelism". A(i) = ..... ; A(i)'s are initialized. B(i) = .... ; B(i)'s are determined. C(i) = .... ; C(i)'s are determined. DO i = 1..N A(B(i)) = ..... ; Array A is updated. T(i) = A(C(i)) + .. ; Array A is consumed for some operation. ENDDO I am currently working on ways to exploit dynamic parallelism as opposed to static parallelism which the compiler assesses parallelism during compilation. The model of computation is data-flow and the specific type of data-flow computer model is tagged token data-flow computer. I am using a mechanism known as "token relabeling" to do this. The results of my simulation so far is good. However, I have some fundamental questions to ask. Q1. In exactly what type of numerical simulations do such loops appear and why are they necessary? Q2. Are there other ways to do it without having the subscripts unknown at compile time, i.e. is exploitation of dynamic parallelism avoidable while still achieving parallel execution? Q3. Is it possible for me to get the fragment of the simulation program which has such loop? i.e. assuming answer to the first part of question 1 is true. Answers to these questions will be appreciated very much. Chinhyun Kim chkim@pollux.usc.edu
bjornl@tds.kth.se (Bj|rn Lisper) (11/15/89)
Factorizing sparse matrices gives a prime example of what you call dynamic parallelism. When you don't know the structure of the matrix (i.e. where the non-zeroes are), you can't assume independence between any rows and thus the triangularization must proceed sequentially, with the only possible parallelism being pipelining between the Gaussian elimination on subsequent rows. When the structure is known, however, it is possible to detect independence between rows: then, they can be operated on in parallel. Note that there are counterparts to "dynamic parallelism" in sequential processing as well. When factorizing sparse matrices sequentially, for instance, it turns out to be advantageous to perform a so-called "symbolic factorization" as soon as the structure is known. This amounts to predicting the fill-in, so that vector entries can be reserved beforehand. Thus, the need for handling dynamic memory structures during the factorization is eliminated. This is clearly related to partial evaluation. Partial evaluation means using increased knowledge about indata to simplify functions, or program code. "Ordinary" parallelism detection is then partial evaluation at compile-time and dynamic parallelism is partial evaluation at run-time. Bjorn Lisper