[comp.lang.apl] APL for Parallel Algorithms

rjfrey@uunet.UU.NET (Robert J Frey) (12/07/88)

Does anyone know of anyone who is using APL to implement or describe
parallel algorithms?  It would seem that APL, which tends to produce
"loopless" code, would be a natural medium for many applications.

Anyone care to comment about the utility of a "parallel APL" implementation?

==============================================================================
|Dr. Robert J. Frey               | {acsm, sbcs, polyof}!kepler1!rjfrey      |
|Kepler Financial Management, Ltd.|------------------------------------------|
|100 North Country Rd., Bldg. B   | The views expressed are wholly my own and|
|Setauket, NY  11766              | and do not reflect those of the Indepen- |
|(516) 689-6300 x.16              | dent Republic of Latvia.                 |
==============================================================================

eugene@eos.arc.nasa.gov (Eugene Miya) (12/08/88)

I asked several people involved in the use of APL on the STAR-100 at LLNL
in the 1970s why it didn't take off (The "Geez, it's logically parallel
vector [what have you]...")  The inconsistent reply had to do with the
order of evaluation.

Steve Wallach at Convex has also mentioned APL and I sent him Tim Budd's
APL compiler (would not quite come up on on C-1, yacc problems).  It will
be interesting to watch the development of parallel languages in coming years:
the semantics of parallelism have changed, how much will look like APL?
How much will differ?  Portability (compiler/interpreter as well as
program) will be an increasing concern.  If APL lags, you won't see it on
parallel machines (APL2 on IBMs may be an exception due to inertia).

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."

fpst@hubcap.UUCP (Steve Stevenson) (12/08/88)

>Steve Wallach at Convex has also mentioned APL and I sent him Tim Budd's
>APL compiler (would not quite come up on on C-1, yacc problems).  It will
I will be very grateful if somebody could give me copies of aboved
mentiond APL compiler.  I am trying to write APL compiler/interpreter
for Amtek2010 multicomputer.  

In other related subject matters; does anybody know if there are 
libraries of common subroutines for hypercube or message passing
multicomputers?(FFT, sorting, etc).

Thanks

--------------------------------------------------------------------
Sehyo Chang
sec@coby.ics.hawaii.edu
University of Hawaii


-- 
Steve Stevenson                            fpst@hubcap.clemson.edu
(aka D. E. Stevenson),                     fpst@prism.clemson.csnet
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

evan@plx.UUCP (Evan Bigall) (12/09/88)

In article <3779@hubcap.UUCP> kepler1!rjfrey@uunet.UU.NET (Robert J Frey) writes:
>Does anyone know of anyone who is using APL to implement or describe
>parallel algorithms?  It would seem that APL, which tends to produce
>"loopless" code, would be a natural medium for many applications.
>
>Anyone care to comment about the utility of a "parallel APL" implementation?

I was inolved in the implementation of a parallel/vector APL compiler at IBM's 
Watson Labs.  One of the largest problems with parallel APL is that to really
get a speed up, you need a compiler and we all know how much fun compiling 
APL is.  I left the project with the opinion that while there are a lot of
ideas from apl that apply to parallel programming (explicit vector operations,
large basic blocks, many "heavy" nodes) the current semantics of the language 
make it very difficult to access those concepts.  I always find it interesting 
how many papers on parallel languages footnote APL without really taking   
anything from APL (Steele's C* comes first to my mind, but there are others).

I have just moved, so my stuff is a mess, but if you are interested I think
I can dig up the refrences for the project.

Evan  
...!sun!plx!evan
I barely have the authority to speak for myself, certainly not anybody else

rjfrey@kepler1.UUCP (Robert J Frey) (12/10/88)

In article <1522@plx.UUCP> evan@plx.UUCP (Evan Bigall) writes:
>
>I was inolved in the implementation of a parallel/vector APL compiler at IBM's 
>Watson Labs.  One of the largest problems with parallel APL is that to really
>get a speed up, you need a compiler and we all know how much fun compiling 
>APL is...
>

I certainly would be interested in any references you could make available.
Perhaps, in terms of APL, a more modest goal would be appropriate:  Instead
of thinking of APL as a means for specifying  or implementing parallel
algorithms, just try to implement APL in a way in which allows it to exploit
the vector/parallel facilities of the target environment.  For example, if
I want to compute the inner product A+.xB given the simple numeric matrices
A and B, this seems easy to implement in a way which exploits vector/
parallel facilities.  Another example would be many uses of the "for each"
operator; doesn't this essentially specify independent execution chains?

I can recall one instance at a former employer where we took a "linear algebra
intensive" application in APL2 and quadNA'd some ESSL functions to handle
such things as inner products, pivots, etc. and realized a thirty-fold increase
in speed (from ~5 min to ~9 sec.).  If more of this sort of thing can be done
in ways more transparent to the user, then APL would be much improved even
though it may not represent a general solution to specifying parallel 
algorithms.

==============================================================================
|Dr. Robert J. Frey               | {acsm, sbcs, polyof}!kepler1!rjfrey      |
|Kepler Financial Management, Ltd.|------------------------------------------|
|100 North Country Rd., Bldg. B   | The views expressed are wholly my own and|
|Setauket, NY  11766              | and do not reflect those of the Indepen- |
|(516) 689-6300 x.16              | dent Republic of Latvia.                 |
==============================================================================

jaxon@uicsrd.csrd.uiuc.edu (12/13/88)

>I want to compute the inner product A+.xB given the simple numeric matrices
>A and B, this seems easy to implement in a way which exploits vector/
>parallel facilities.

done

Many APLs already do this.  The inner loops of the interpreter typically
are tuned to the machine's pipeline, vector ops, or concurrency features.
STSC's UNIX APL for Alliant FX/8,  Analogic's APL Machine, are two popular
examples.  

> One of the largest problems with parallel APL is that to really
>get a speed up, you need a compiler and we all know how much fun compiling 
>APL is...

That is to say that naive interpretation using parallel features will only
improve the "per item" times, not the  "per function" or "per line" costs.
Eventually these costs dominate the computation time, especially in APL1
coding styles.  On the APL Machine, the attached low-precision array
processor reduces per-item times so much that "counting the symbols" in
an expression gives a good approximation of the number of milliseconds
it will compute, regardless of data size.

There is hope for compilation however.  Several aspects of APL make it
superior to Fortran 8x:

   - Array temporaries are inaccessible
   - Loop control information is implicit
   - Storage cannot be aliased or equivalenced
   - Data dependence distances (actual time between side-effects)
     is long relative to average Fortran
   - Arrays are used in their entirety more often than in Fortran
     DIMENSION R(100,100) ... but they only use the upper left NxM.
   - The parallelism in APL does not involve an explicit notion of
     execution thread.  In 8x extensions (such as PCF Fortran) the
     program text commits to a certain distribution of work.


>Another example would be many uses of the "for each"
>operator; doesn't this essentially specify independent execution chains?

  A standard for APL2 could be written to capture that
  idea, but as it stands right now all the APL2-like dialects agree
  that items of the arguments are processed in ravel order.  Only a
  smart compiler could detect that the functional operand of Each has
  no non-trivial side effect (fully parallelizable) or has infrequent
  enough side effects that a little synchronization is all that's needed.

  In my APLB implementation, the operators (Each, Reduce, Scan, Product)
  are by far the fastest way to loop through user code.  They can call
  a user fn repeatedly, yet only set up its stack and enter it once.
  I like to cite the fact that a multi-column numeric grade function
  written using the Reduce operator to accumulate the compound permutation
  vector from  a vector of the enclosed columns of the data, ran 
  substantially faster than the best APL1 code for this problem, and
  only slightly slower than a primitive implementation.  In fact we 
  decided to print it as an example under Grade and Reduce instead of
  adding more code to the numeric grade primitive.

  These operators should be optimal even on serial machines.  Users 
  will go wherever the code runs the fastest, and in APL they will
  use constructs which will usually be parallelizable.

  A more serious impediment to parallel/vector APLs is the arithmetic
  conventions of current hardware.  Tolerant equality, typeless numerics,
  and integer vector math  are not well-supported by current vendors.
  
We have this dream... and someday...

 .greg jaxon.    jaxon@uicsrd.uxc.uiuc.edu
 UI Center for Supercomputing Research and Development