dave@visual1.jhuapl.edu (Dave Weintraub) (06/12/91)
This is being posted on INTERNET by dave@visual1.jhuapl.edu. I can also be reached as dave@aplvm.bitnet Dave Weintraub The Applied Physics Laboratory The Johns Hopkins University Johns Hopkins Road 3-145 Laurel MD 20723-6099 USA (301)-953-5839 ========= I must relate an incident from last week... I attended a seminar sponsored by CONVEX on issues and directions. One speaker was Ken Kennedy, of Rice University, speaking about language directions for parallel processing. He described intense ongoing work in FORTRAN analysis and modifications for FORTRAN 90/2000 to allow specification of machine-dependent parallel operations. While I know that FORTRAN is the language of choice for most engineers (I do not *agree*, I just accept this with a sigh), I *had* to ask him about APL as an alternative. I described Bob Bernecke's ACORN work, and pointed out that Bob had done in about a man week what Rice was spending man decades trying to do. His replies (now remember, this is an industry expert speaking at a public forum): 0. "APL is not a good language for doing parallel algorithms." His reasoning? People take huge outer products and then reduce or slice along a subarray, and thus the algorithms are inefficient (I tried to explain about properly-formed arguments for inner product, and about the RANK operator). 1. Bob's work, as described by me, was "an obvious fabrication". Noone could build an interlanguage translation scheme that efficient, that quickly. And even if they could, it would be useless because of point 0. 2. APL is difficult to learn (no interest in data indicating otherwise). )FLAME OFF as they (sort of) say in the discussion lists... but I am still boiling! Dave
hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/13/91)
Dave, you related three arguments that your antagonist made. The last two are unsubstantiated opinions, and only the first seemed based on facts. And the facts are wrong. > 0. "APL is not a good language for doing parallel algorithms." His reasoning? > People take huge outer products and then reduce or slice along a subarray, > and thus the algorithms are inefficient (I tried to explain about > properly-formed arguments for inner product, and about the RANK operator). These so-called "huge outer products" can take CONSTANT time to compute on parallel machines. For example, consider the problem of computing x e. y (x epsilon y) where x and y are vectors of length m and n. One algorithm is +./ y =/ x (or/[0] y jot . = x) On a serial machine, this takes time O m*n (and IS inefficient). On a SIMD machine (like the Connection Machine), a straightforward translation of the algorithm yields: Propogate x and y to m*n processors O log m>.n Compute the m*n = operations of the outer product O 1 [!] Compute the or-reduction O log n The total time taken is O log m>.n . Log time is "speed of light"; this "huge outer product" algorithm is in fact time optimal. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096
tedwards@aplcomm.JHUAPL.EDU (Edwards Thomas G S1A x8297) (06/15/91)
In article <1991Jun13.044828.29504@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: About outer products in APL in parellel... >On a serial machine, this takes time O m*n (and IS inefficient). >On a SIMD machine (like the Connection Machine), a straightforward >translation of the algorithm yields: > Propogate x and y to m*n processors O log m>.n > Compute the m*n = operations of the outer product O 1 [!] > Compute the or-reduction O log n >The total time taken is O log m>.n . Log time is "speed of light"; >this "huge outer product" algorithm is in fact time optimal. I have spent two years programming the Connection Machine, and my favorite choice for a high-level language would be APL by far. Having variables be vectors or matixes as naturally as scalars would be immensely useful. To make it really useful, though, one would have to spend alot of time and carefully look at where systollic array algorithms would speed it up (such as matrix multiplication). Unfortunately, this will probably never happen due to the MITheadedness at TMC. I'll admit, I myself had a prejudice against APL until I began using it. But one thing is for sure, APL is no sillier than that wierd FORTRAN they have running on the CM. -Tom
prins@cs.unc.edu (Jan Prins) (06/20/91)
In article <1991Jun13.044828.29504@yrloc.ipsa.reuter.COM>, Roger Hui writes: > > (about a remark of Ken Kennedy:) > > > 0. "APL is not a good language for doing parallel algorithms." His reasoning? > > People take huge outer products and then reduce or slice along a subarray, > > and thus the algorithms are inefficient (I tried to explain about > > properly-formed arguments for inner product, and about the RANK operator). > > These so-called "huge outer products" can take CONSTANT time to > compute on parallel machines. For example, consider the problem of > computing x e. y (x epsilon y) where x and y are vectors > of length m and n. One algorithm is > > +./ y =/ x (or/[0] y jot . = x) > > On a serial machine, this takes time O m*n (and IS inefficient). > On a SIMD machine (like the Connection Machine), a straightforward > translation of the algorithm yields: > > Propogate x and y to m*n processors O log m>.n > Compute the m*n = operations of the outer product O 1 [!] > Compute the or-reduction O log n > > The total time taken is O log m>.n . Log time is "speed of light"; > this "huge outer product" algorithm is in fact time optimal. Unfortunately I don't get to this newsgroup much, and although I'm a big fan of APL and related languages for data-parallel programming, I must raise some flags regarding the "efficiency" claim above. You speak easily of computing m*n comparison operations in unit time, but of course this presupposes that you have m*n physical processors. The correct time for this operation is O((m*n)/P) where P is the number of actual processors. Note that P is a constant! If you are wealthy you can buy P=64K today, but even this substantial machine runs out of processors for m=n=256 with this algorithm. For longer vectors the point is that this algorithm IS inefficient: its asymptotic time complexity is O(m*n) since P is constant. For sufficiently large inputs this algorithm will take longer than the efficient sequential algorithm on a sequential machine. In this case I would predict that at n=m=20,000 APL on my workstation would outrun a full-tilt CM-2 using the outer product algorithm. And it is precisely the larger instances of a problem (i.e. longer vectors) which drive people to consider execution on parallel platforms. I like the big outer products programming style: it is exquisitely expressive. But executed directly on large inputs, it is not efficient. I don't agree with Kennedy that this is a condemnation of APL's suitability for scientific programming. It suggests that some refinement is needed to move an APL program into production use on very large problem sizes. Perhaps a compiler can do this (this is awfully difficult and may be the basis of Kennedy's objection) or more likely the programmer must sort this out. This latter technique seems to be the one that Kennedy espouses in the Fortran-90 case, and I don't see why it couldn't apply to the APL case as well, with the added advantage of much more flexibility in the initial expression of the algorithm. > Roger Hui (hui@yrloc.ipsa.reuter.COM) --\-- Jan Prins (prins@cs.unc.edu) / Computer Science Dept. --\-- UNC Chapel Hill
hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/22/91)
Reply to article <4481@borg.cs.unc.edu> by prins@cs.unc.edu (Jan Prins). Thank you for your comments. In my original post, the first step was: Propagate x and y to m*n processors O log m>.n This (obviously) assumes that there are m*n processors, without the lamentable parameter P. Assuming processors as an unbounded resource is analogous to the common practice of assuming space as an unbounded resource. I should have made this assumption more explicit. Processors as an unbounded resource is not unreasonable. (It wasn't long ago that P=64K was a parameter for space ...) With this embarrassment of riches, how fast can one compute? The answer for x epsilon y is, O log m>.n time, optimal time, using the profligate outer product algorithm. Daniel Hillis, speculating on a billion processor machine built with current technology (Scientific American, June 1987), wrote: There are technical problems inherent in building such a computational engine, but they are soluble. The real problems are those of the imagination: conceiving how such power would be used. ... x epsilon y is a complex and recurring theme with many nuances. I do not pretend that a single sentence, even if an APL sentence :-), adequately covers the subject. Consider the outer product algorithm a catalyst for the imagination. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096