[comp.lang.apl] ACORN, and other niceties

dave@visual1.jhuapl.edu (Dave Weintraub) (06/12/91)

This is being posted on INTERNET by
     dave@visual1.jhuapl.edu.
I can also be reached as
     dave@aplvm.bitnet
Dave Weintraub   The Applied Physics Laboratory  The Johns Hopkins University
   Johns Hopkins Road 3-145    Laurel MD 20723-6099   USA   (301)-953-5839
=========
I must relate an incident from last week...

I attended a seminar sponsored by CONVEX on issues and directions.  One speaker
was Ken Kennedy, of Rice University, speaking about language directions
for parallel processing.  He described intense ongoing work in FORTRAN
analysis and modifications for FORTRAN 90/2000 to allow specification
of machine-dependent parallel operations.

While I know that FORTRAN is the language of choice for most engineers
(I do not *agree*, I just accept this with a sigh),
I *had* to ask him about APL as an alternative.  I described
Bob Bernecke's ACORN work, and pointed out that Bob had done
in about a man week what Rice was spending man decades trying to do.
His replies (now remember, this is an industry expert speaking at a
public forum):

0.  "APL is not a good language for doing parallel algorithms."  His reasoning?
People take huge outer products and then reduce or slice along a subarray,
and thus the algorithms are inefficient (I tried to explain about
properly-formed arguments for inner product, and about the RANK operator).

1.  Bob's work, as described by me, was "an obvious fabrication".  Noone
could build an interlanguage translation scheme that efficient, that quickly.
And even if they could, it would be useless because of point 0.

2.  APL is difficult to learn (no interest in data indicating otherwise).


)FLAME OFF
as they (sort of) say in the discussion lists...
but I am still boiling!

Dave

hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/13/91)

Dave, you related three arguments that your antagonist made.  The last
two are unsubstantiated opinions, and only the first seemed based on facts.
And the facts are wrong.
 
> 0.  "APL is not a good language for doing parallel algorithms."  His reasoning?
> People take huge outer products and then reduce or slice along a subarray,
> and thus the algorithms are inefficient (I tried to explain about
> properly-formed arguments for inner product, and about the RANK operator).
 
These so-called "huge outer products" can take CONSTANT time to
compute on parallel machines.  For example, consider the problem of
computing  x e. y  (x epsilon y)  where x and y are vectors
of length m and n.  One algorithm is
 
 +./ y =/ x   (or/[0] y jot . = x)
 
On a serial machine, this takes time O m*n (and IS inefficient).
On a SIMD machine (like the Connection Machine), a straightforward
translation of the algorithm yields:
 
 Propogate x and y to m*n processors                 O log m>.n
 Compute the m*n = operations of the outer product   O 1        [!]
 Compute the or-reduction                            O log n
 
The total time taken is O log m>.n .  Log time is "speed of light";
this "huge outer product" algorithm is in fact time optimal.

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096

tedwards@aplcomm.JHUAPL.EDU (Edwards Thomas G S1A x8297) (06/15/91)

In article <1991Jun13.044828.29504@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes:
About outer products in APL in parellel...
>On a serial machine, this takes time O m*n (and IS inefficient).     
>On a SIMD machine (like the Connection Machine), a straightforward
>translation of the algorithm yields:

> Propogate x and y to m*n processors                 O log m>.n
> Compute the m*n = operations of the outer product   O 1        [!]
> Compute the or-reduction                            O log n

>The total time taken is O log m>.n .  Log time is "speed of light";
>this "huge outer product" algorithm is in fact time optimal.

I have spent two years programming the Connection Machine, and my
favorite choice for a high-level language would be APL by far.
Having variables be vectors or matixes as naturally as scalars would be
immensely useful.  To make it really useful, though, one would have to
spend alot of time and carefully look at where systollic array algorithms
would speed it up (such as matrix multiplication).  

Unfortunately, this will probably never happen due to the MITheadedness
at TMC.  I'll admit, I myself had a prejudice against APL until I began
using it.  But one thing is for sure, APL is no sillier than
that wierd FORTRAN they have running on the CM.

-Tom

prins@cs.unc.edu (Jan Prins) (06/20/91)

In article <1991Jun13.044828.29504@yrloc.ipsa.reuter.COM>, Roger Hui writes:
>
>  (about a remark of Ken Kennedy:)
>
> > 0.  "APL is not a good language for doing parallel algorithms."  His reasoning?
> > People take huge outer products and then reduce or slice along a subarray,
> > and thus the algorithms are inefficient (I tried to explain about
> > properly-formed arguments for inner product, and about the RANK operator).
>  
> These so-called "huge outer products" can take CONSTANT time to
> compute on parallel machines.  For example, consider the problem of
> computing  x e. y  (x epsilon y)  where x and y are vectors
> of length m and n.  One algorithm is
>  
>  +./ y =/ x   (or/[0] y jot . = x)
>  
> On a serial machine, this takes time O m*n (and IS inefficient).
> On a SIMD machine (like the Connection Machine), a straightforward
> translation of the algorithm yields:
>  
>  Propogate x and y to m*n processors                 O log m>.n
>  Compute the m*n = operations of the outer product   O 1        [!]
>  Compute the or-reduction                            O log n
>  
> The total time taken is O log m>.n .  Log time is "speed of light";
> this "huge outer product" algorithm is in fact time optimal.

Unfortunately I don't get to this newsgroup much, and although I'm
a big fan of APL and related languages for data-parallel programming, 
I must raise some flags regarding the "efficiency" claim above.

You speak easily of computing m*n comparison operations in unit time,
but of course this presupposes that you have m*n physical processors. 
The correct time for this operation is O((m*n)/P) where P is the number 
of actual processors.  Note that P is a constant!  If you are wealthy
you can buy P=64K today, but even this substantial machine runs out
of processors for m=n=256 with this algorithm. 

For longer vectors the point is that this algorithm IS inefficient:
its asymptotic time complexity is O(m*n) since P is constant.  For 
sufficiently large inputs this algorithm will take longer than the 
efficient sequential algorithm on a sequential machine.  In this case
I would predict that at n=m=20,000 APL on my workstation would outrun a 
full-tilt CM-2 using the outer product algorithm. 

And it is precisely the larger instances of a problem (i.e. longer
vectors) which drive people to consider execution on parallel platforms.

I like the big outer products programming style: it is exquisitely 
expressive.  But executed directly on large inputs, it is not efficient.  
I don't agree with Kennedy that this is a condemnation of APL's
suitability for scientific programming.  It suggests that some refinement 
is needed to move an APL program into production use on very large 
problem sizes.  Perhaps a compiler can do this (this is awfully difficult 
and may be the basis of Kennedy's objection) or more likely the programmer 
must sort this out.  This latter technique seems to be the one that
Kennedy espouses in the Fortran-90 case, and I don't see why it couldn't
apply to the APL case as well, with the added advantage of much more
flexibility in the initial expression of the algorithm.

> Roger Hui (hui@yrloc.ipsa.reuter.COM)

--\--  Jan Prins  (prins@cs.unc.edu) 
  /    Computer Science Dept.        
--\--  UNC Chapel Hill

hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/22/91)

Reply to article <4481@borg.cs.unc.edu> by prins@cs.unc.edu (Jan Prins).

Thank you for your comments.  In my original post, the first step was:
 
   Propagate x and y to m*n processors     O log m>.n
 
This (obviously) assumes that there are m*n processors, without the 
lamentable parameter P.  Assuming processors as an unbounded resource 
is analogous to the common practice of assuming space as an unbounded 
resource.  I should have made this assumption more explicit.
 
Processors as an unbounded resource is not unreasonable.
(It wasn't long ago that P=64K was a parameter for space ...)
With this embarrassment of riches, how fast can one compute?
The answer for  x epsilon y  is, O log m>.n time, optimal time,
using the profligate outer product algorithm.
 
Daniel Hillis, speculating on a billion processor machine built with
current technology (Scientific American, June 1987), wrote:
 
  There are technical problems inherent in building such a
  computational engine, but they are soluble.  The real
  problems are those of the imagination:  conceiving how
  such power would be used.  ...
 
x epsilon y  is a complex and recurring theme with many nuances.
I do not pretend that a single sentence, even if an APL sentence :-),
adequately covers the subject.  Consider the outer product algorithm
a catalyst for the imagination.

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096