rjfrey@uunet.UU.NET (Robert J Frey) (12/07/88)
Does anyone know of anyone who is using APL to implement or describe parallel algorithms? It would seem that APL, which tends to produce "loopless" code, would be a natural medium for many applications. Anyone care to comment about the utility of a "parallel APL" implementation? ============================================================================== |Dr. Robert J. Frey | {acsm, sbcs, polyof}!kepler1!rjfrey | |Kepler Financial Management, Ltd.|------------------------------------------| |100 North Country Rd., Bldg. B | The views expressed are wholly my own and| |Setauket, NY 11766 | and do not reflect those of the Indepen- | |(516) 689-6300 x.16 | dent Republic of Latvia. | ==============================================================================
eugene@eos.arc.nasa.gov (Eugene Miya) (12/08/88)
I asked several people involved in the use of APL on the STAR-100 at LLNL in the 1970s why it didn't take off (The "Geez, it's logically parallel vector [what have you]...") The inconsistent reply had to do with the order of evaluation. Steve Wallach at Convex has also mentioned APL and I sent him Tim Budd's APL compiler (would not quite come up on on C-1, yacc problems). It will be interesting to watch the development of parallel languages in coming years: the semantics of parallelism have changed, how much will look like APL? How much will differ? Portability (compiler/interpreter as well as program) will be an increasing concern. If APL lags, you won't see it on parallel machines (APL2 on IBMs may be an exception due to inertia). Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene "Send mail, avoid follow-ups. If enough, I'll summarize."
fpst@hubcap.UUCP (Steve Stevenson) (12/08/88)
>Steve Wallach at Convex has also mentioned APL and I sent him Tim Budd's >APL compiler (would not quite come up on on C-1, yacc problems). It will I will be very grateful if somebody could give me copies of aboved mentiond APL compiler. I am trying to write APL compiler/interpreter for Amtek2010 multicomputer. In other related subject matters; does anybody know if there are libraries of common subroutines for hypercube or message passing multicomputers?(FFT, sorting, etc). Thanks -------------------------------------------------------------------- Sehyo Chang sec@coby.ics.hawaii.edu University of Hawaii -- Steve Stevenson fpst@hubcap.clemson.edu (aka D. E. Stevenson), fpst@prism.clemson.csnet Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
evan@plx.UUCP (Evan Bigall) (12/09/88)
In article <3779@hubcap.UUCP> kepler1!rjfrey@uunet.UU.NET (Robert J Frey) writes: >Does anyone know of anyone who is using APL to implement or describe >parallel algorithms? It would seem that APL, which tends to produce >"loopless" code, would be a natural medium for many applications. > >Anyone care to comment about the utility of a "parallel APL" implementation? I was inolved in the implementation of a parallel/vector APL compiler at IBM's Watson Labs. One of the largest problems with parallel APL is that to really get a speed up, you need a compiler and we all know how much fun compiling APL is. I left the project with the opinion that while there are a lot of ideas from apl that apply to parallel programming (explicit vector operations, large basic blocks, many "heavy" nodes) the current semantics of the language make it very difficult to access those concepts. I always find it interesting how many papers on parallel languages footnote APL without really taking anything from APL (Steele's C* comes first to my mind, but there are others). I have just moved, so my stuff is a mess, but if you are interested I think I can dig up the refrences for the project. Evan ...!sun!plx!evan I barely have the authority to speak for myself, certainly not anybody else
rjfrey@kepler1.UUCP (Robert J Frey) (12/10/88)
In article <1522@plx.UUCP> evan@plx.UUCP (Evan Bigall) writes: > >I was inolved in the implementation of a parallel/vector APL compiler at IBM's >Watson Labs. One of the largest problems with parallel APL is that to really >get a speed up, you need a compiler and we all know how much fun compiling >APL is... > I certainly would be interested in any references you could make available. Perhaps, in terms of APL, a more modest goal would be appropriate: Instead of thinking of APL as a means for specifying or implementing parallel algorithms, just try to implement APL in a way in which allows it to exploit the vector/parallel facilities of the target environment. For example, if I want to compute the inner product A+.xB given the simple numeric matrices A and B, this seems easy to implement in a way which exploits vector/ parallel facilities. Another example would be many uses of the "for each" operator; doesn't this essentially specify independent execution chains? I can recall one instance at a former employer where we took a "linear algebra intensive" application in APL2 and quadNA'd some ESSL functions to handle such things as inner products, pivots, etc. and realized a thirty-fold increase in speed (from ~5 min to ~9 sec.). If more of this sort of thing can be done in ways more transparent to the user, then APL would be much improved even though it may not represent a general solution to specifying parallel algorithms. ============================================================================== |Dr. Robert J. Frey | {acsm, sbcs, polyof}!kepler1!rjfrey | |Kepler Financial Management, Ltd.|------------------------------------------| |100 North Country Rd., Bldg. B | The views expressed are wholly my own and| |Setauket, NY 11766 | and do not reflect those of the Indepen- | |(516) 689-6300 x.16 | dent Republic of Latvia. | ==============================================================================
jaxon@uicsrd.csrd.uiuc.edu (12/13/88)
>I want to compute the inner product A+.xB given the simple numeric matrices >A and B, this seems easy to implement in a way which exploits vector/ >parallel facilities. done Many APLs already do this. The inner loops of the interpreter typically are tuned to the machine's pipeline, vector ops, or concurrency features. STSC's UNIX APL for Alliant FX/8, Analogic's APL Machine, are two popular examples. > One of the largest problems with parallel APL is that to really >get a speed up, you need a compiler and we all know how much fun compiling >APL is... That is to say that naive interpretation using parallel features will only improve the "per item" times, not the "per function" or "per line" costs. Eventually these costs dominate the computation time, especially in APL1 coding styles. On the APL Machine, the attached low-precision array processor reduces per-item times so much that "counting the symbols" in an expression gives a good approximation of the number of milliseconds it will compute, regardless of data size. There is hope for compilation however. Several aspects of APL make it superior to Fortran 8x: - Array temporaries are inaccessible - Loop control information is implicit - Storage cannot be aliased or equivalenced - Data dependence distances (actual time between side-effects) is long relative to average Fortran - Arrays are used in their entirety more often than in Fortran DIMENSION R(100,100) ... but they only use the upper left NxM. - The parallelism in APL does not involve an explicit notion of execution thread. In 8x extensions (such as PCF Fortran) the program text commits to a certain distribution of work. >Another example would be many uses of the "for each" >operator; doesn't this essentially specify independent execution chains? A standard for APL2 could be written to capture that idea, but as it stands right now all the APL2-like dialects agree that items of the arguments are processed in ravel order. Only a smart compiler could detect that the functional operand of Each has no non-trivial side effect (fully parallelizable) or has infrequent enough side effects that a little synchronization is all that's needed. In my APLB implementation, the operators (Each, Reduce, Scan, Product) are by far the fastest way to loop through user code. They can call a user fn repeatedly, yet only set up its stack and enter it once. I like to cite the fact that a multi-column numeric grade function written using the Reduce operator to accumulate the compound permutation vector from a vector of the enclosed columns of the data, ran substantially faster than the best APL1 code for this problem, and only slightly slower than a primitive implementation. In fact we decided to print it as an example under Grade and Reduce instead of adding more code to the numeric grade primitive. These operators should be optimal even on serial machines. Users will go wherever the code runs the fastest, and in APL they will use constructs which will usually be parallelizable. A more serious impediment to parallel/vector APLs is the arithmetic conventions of current hardware. Tolerant equality, typeless numerics, and integer vector math are not well-supported by current vendors. We have this dream... and someday... .greg jaxon. jaxon@uicsrd.uxc.uiuc.edu UI Center for Supercomputing Research and Development