rjc@CS.UCLA.EDU (Robert Collins) (07/03/89)
First of all, I don't work for TMC, so this is totally unofficial. I simply have had *lots* of experience hacking on the CM2. I don't use *lisp, so I can't give details on *lisp function names... I am posting rather than responding via e-mail so the CM2 doesn't get a bum rap (and a bad reputation) from inexperienced users. In article <5886@hubcap.clemson.edu> jps@cat.cmu.edu (James Salsman) writes: >I really hate having to deal with indirect addressing on >most SIMD machines. I wish someone would build a SIMD array >using PE's with address buffers. Just one tiny address >buffer per processor is all I want... nothing fancy. As >long as *ALL* the memory addresses have to come over the >global instruction stream and thus are the *SAME* for each >element, a lot of potential processing power is going to >waste! > >For example, on the Connection Machine in *Lisp, indirect >aref!!'s take FOR EVER. This is SERIOUSLY slowing down the >Production System that I wrote in *Lisp (regardless, it's >faster than CMU/Soar's Production System Machine or any >other implementation of a production system that I've heard >about.) TMC has added somthing called "sideways arrays" to >help indirect addressing, but the *Lisp manual is totally >obscure (so what else is new) and from what I can tell, it >looks like "sideways" means "spread out over several >physical processors." Ack/Pft! > [ ... ] I'd tell you to RTFM, but I realize TFM can be hard to R (and understand)! :-) Yes, your basic CM array reference takes a long time. Everyone gets bitten by this one :-). It runs in time O(n), where n is the **number of elements in your array**. This is *bad* for big arrays. the aref!! function does not do any actual indirect addressing. What it does is loop through all indicies in your array. In each iteration, it turns on every processor that has an index equal to the loop index, does a move operation, and goes onto the next element of the array. With the release of version 5.0 of the CM system software, we now have `fast' indirect addressing. It is performed by the chip that does serial to parallel conversion (for feeding the FPA chips). Since this hardware works on 32-bit chuncks (and there is one for every 32 processing elements), you are stuck with fast arrays with 32 bit elements (or multiples of 32, I guess). This is a small price to pay for the speed. Using the fast (sideways) arrays, an array reference takes on the order of 2-3 times as long as a 32-bit in-processor move. They are sometimes called `sideways' arrays, because each element in the per-processor arrays is spread across the memory of 32 processors (so they can be accessed in parallel). If you have a lookup table that is the same for all processors (or at least in groups of 32), you can minimize your memory requirements by physically sharing the array. James, I am sure that TMC's customer support group will be able to give you details about how to access both fast arrays and fast shared arrays from *lisp. Their email address is csg@think.com. I completely agree that indirect addressing is an extremely important aspect of a SIMD architecture. In fact, I have yet to write a program for the CM that didn't beg for indirect addressing. Although TMC perhaps could have handled indirect addressing in a more general way, I feel that their efforts in this direction have been quite adequate. I have done some pretty bizarre things with the CM2, and it has been a great piece of hardware. My first project was to study the performance issues involved in the emulation of MIMD-style computation on the CM (a SIMD machine). This type of emulation is critically dependent on indirect addressing. By the way, I realize that MIMD on the CM is a really brain damaged thing to do, but it works *very* well. I still haven't figured out many MIMD applications that can use a million or more processors :-). I am convinced that the reverse (emulation of a SIMD machine on a MIMD machine) would be much more difficult and orders of magnitude slower. My latest project has been the simulation of populations of artificial animals (yes, Artificial Life). This is also quite dependent on indirect addressing. There is no way that an existing MIMD computer can match the CM for this type of application! When I first started working on the CM, I had trouble thinking of ways to use all those processors. Now, I have trouble thinking of ways to make my problems small enough to fit on the CM! Oh well.... I hope this has helped to clear up the confusion about indirect addressing on the CM. Rob ------------------------------------------------------------------------------- rjc@cs.ucla.edu C++/Paris on the CM2: Object Oriented SIMD madness -------------------------------------------------------------------------------
aklietz@ncsa.uiuc.edu (Alan Klietz) (07/05/89)
In article <5902@hubcap.clemson.edu> rjc@CS.UCLA.EDU (Robert Collins) writes: >Yes, your basic CM array reference takes a long time. Everyone gets >bitten by this one :-). It runs in time O(n), where n is the >**number of elements in your array**. This is *bad* for big arrays. If "indirect addressing" means what I think it does -- each PE grabs a value from an arbitrary address in an arbitrary PE -- then there exists an algorithm in time O(k log n) to do the equivalent operation by repeatedly using the send operator. Algorithm: Given a set of PEs { P(i) | 0 <= i < n }, we want each PE to fetch a word from the PE at f(P(i)). The inverse of f is undefined. Input: f(i), 0 <= i < n, indirection function x(i), set of values to dereference T(i), temporary index array Output: { y | y(i) = f(x(i)) } Without loss of generality, we assume i = P(i) and that there exists a value called nil not in the domain or range of any function. for all i y(i) := nil, T(i) := nil repeat more := false for all f(i) | f(i) is not nil, do T(f(i)) := i // send-with-overwrite for all T(i) | T(i) is not nil, do more := true y(T(i)) := x(i) // send T(i) := nil for all y(i) | y(i) is not nil, do f(i) := nil until more = false Note that f(x(i)) never appears on the RHS of any assignment. Assuming the forall statements are O(log n), the total time required is O(k log n) where k = max( ||{{i,j} | f(i) = f(j)}|| ) + 1 i,j The best case is when f is invertible, where it is O(log n). The worst case is when f is constant, where it is O (n log n). -- Alan E. Klietz University of Illinois at Urbana-Champaign National Center for Supercomputing Applications 152 Computing Applications Building 605 E. Springfield Avenue Champaign, IL 61820 Ph: +1 217 244 8024 ARPA: aklietz@ncsa.uiuc.edu
rjc@CS.UCLA.EDU (Robert Collins) (07/06/89)
In article <5918@hubcap.clemson.edu> aklietz@ncsa.uiuc.edu (Alan Klietz) writes: >In article <5902@hubcap.clemson.edu> rjc@CS.UCLA.EDU (Robert Collins) writes: >>Yes, your basic CM array reference takes a long time. Everyone gets >>bitten by this one :-). It runs in time O(n), where n is the >>**number of elements in your array**. This is *bad* for big arrays. > >If "indirect addressing" means what I think it does -- each PE >grabs a value from an arbitrary address in an arbitrary PE -- then >there exists an algorithm in time O(k log n) to do the equivalent >operation by repeatedly using the send operator. > >Algorithm: [ deleted ] On the CM2, you can do it a lot faster than Alan's algorithm, if you use the "fast" arrays described in my article (from which the above quote is extracted--the quote is about "slow" arrays). The indirect addressing that I spoke of was an arbitrary index in an array within the processor. There are also send/get operations that write to/read from an arbitrary index of such arrays on an arbitary processor. These operations will allow the sort of indirect addressing that is described above (well, actually only to a close approximation--but probably good enough for all practical purposes). I do not know the algorithm used by the router microcode, but it will be faster than the (deleted) algorithms (one would hope). If you have really nasty collision patterns, things can slow down considerably. It seems like I usually end up with nasty collision patterns :-). Rob ------------------------------------------------------------------------------- rjc@cs.ucla.edu C++/Paris on the CM2: Object Oriented SIMD madness -------------------------------------------------------------------------------