reinhard@rice.edu (Reinhard von Hanxleden) (05/30/91)
Dear netters, I would be grateful for comments/references on two issues regarding the Connection Machine. 1. IRREGULAR PROBLEMS (PIC codes, sparse matrices, _your favorite_, ...) - Which problems have been tackled? How did the implementation differ from equivalent MIMD implementations? - Have "classic" (MIMD) load balancing paradigms (e.g. recursive bisection) been applicable? - Any experiences in using indirection vectors to distribute a problem evenly? 2. GENERATING EFFICIENT CM-FORTRAN (CMF) FROM F77 / F90 - Who has experiences in tackling this with standard vectorizers? (One report I am aware of is by J. Myczkowski and R. Shapiro, trying the KAP vectorizer on 4 small samples; J. Mellor-Crummey has used PFC on a larger weather code.) - Have any vectorizers been modified for this task yet? - Any attempts to generate Layouts / Alignments, Foralls, or other CMF- specialities automatically from either F77 or F90? Have any major problems been identified yet? (Now, THIS should cause an avalanche, right?) Any piece of information will be appreciated, collected, and - if sufficient interest arises - be posted. Thanks in advance, Reinhard v. Hanxleden reinhard@rice.edu Department of Computer Science, Rice University P.O. Box 1892, Houston, TX 77251, (713) 527-4834 -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
reinhard@rice.edu (Reinhard von Hanxleden) (06/07/91)
Here is a summary of the responses to my posting last week. Thanks to everybody who answered! Reinhard v. Hanxleden reinhard@rice.edu Department of Computer Science, Rice University P.O. Box 1892, Houston, TX 77251, (713) 527-4834 ########################################################################### From: ajclear@cs.sandia.gov (Andrew J. Cleary) Re your posting to comp.parallel on CM Fortran: Steve Kratzer at Supercomputing Research Center (kratzer@super.org) implemented a sparse QR factorization algorithm a year ago in Paris that he is almost done porting to CM Fortran. Indeed, the algorithm uses a very different methodology than MIMD sparse factorization algorithms: it totally foregoes the usual coarse grained parallelism used in MIMD in favor of a fine-grained parallelism, with the benefit that more "regularity" is introduced. Gilbert and Schreiber publish a tech report on sparse Cholesky that utilizes a more MIMD type approach, but get crappy results. Kratzer and I are currently in the process of designing and coding (yes, in CM Fortran) a sparse Cholesky code based on his QR code but with some major differences. While it is still pretty different than MIMD codes, with the new slicewise compiler we are using the CM in a more coarse-grained fashion, so that the new algorithm/code is looking more like MIMD codes than Steve's original code. It remains to be seen whether any of these approaches will utilized the CM at anywhere near acceptable speeds (so far, none has, with Steve's QR code coming in at about 25 Mflops). There are various quirks associated with the slice-wise compiler. Some are bugs that get fixed occasionally. Some are a little more basic to the compiler (especially those involving performance and axes layed out serially), though rumor has it that the next version of the compiler is due out in months and will be more clever with serial axes. I know nothing about attempt to automatic generation of CM Fortran code from old code. ########################################################################### From: Harry Berryman <berryman-harry@cs.yale.edu> You should ask Seema Hiranindani (now, Mirchindani) about the work by Saltz, Berryman,etc. I'm assuming that you're with Ken K.'s group. What kind of stuff do you intend on doing? We've done a fair amount of irregular stuff on both the 'cube and the CM. ########################################################################### From: @duncan.cs.utk.edu:clay@cs.utk.edu (Clay P. Breshears) Although I have had no direct experince on using of the methods thazt you mention, I do know that Thinking Machines, Corp. has a package that maps irregular problems to the Connection Machine quite efficiently. It is an unofficial product from TMC called the Routing Compiler writtn by Denny Dahl. It takes a communication graph of your problem and generates a mapping of the problem to the CM architecture that will minimize the distance communications must travel. As of mid-January this year, the Routing Compiler was out of beta test and went out with 6.0 release of the system. A CM Fortran interface is available. If you wnat more information, I would suggest contacting TMC directly since the above is just about everything that I have on it. From: David B. Serafini <serafini@amelia.nas.nasa.gov> Denny Dahl at Thinking Machines has been doing some work on how to allocate points in a computational grid to virtual-processors based on analysis of communication pattterns. This may be applicable to sparse matrices too. It produces significant reductions in communication time by balancing the communication bandwidth available and required. ########################################################################### From: Steve Hammond <hammonds@riacs.edu> I have written two papers covering different aspects of working with irregular or unstructured problems on the CM. S. W. Hammond and T. J. Barth, "An Efficient Massively Parallel Euler Solver for 2D Unstructured Grid Problems", to appear in the AIAA Journal, also, AIAA paper 91-0441, AIAA Aerospace Sciences Meeting, Reno Nevada, Jan, 1991 and S. W. Hammond and Robert Schreiber, "Mapping Unstructured Grid Problems to the Connection Machine", RIACS Technical Report 90.22, October, 1990. The problem is that very little can be done by the system at "compile" time since the irregularity is not known until run time. Check the proceedings of the 1990 hypercube conference for a paper by Denny Dahl of Thinking Machines. He has done work on a communication compiler that stores the communication pattern at runtime and then uses it over and over again for irregular communications. If you want either of the two papers listed above, send me some mail with your address. > Have you heard of the "PARTY" system being worked on by Joel Saltz and some others at ICASE? I don't know too much about it because it is on the fringe of my interests but it is supposed to be a runtime system to manage unstructured communications. ########################################################################### From: hugo@griggs.dartmouth.edu (Peter Su) I'm not too familiar with these kinds of problems, but I have been doing a lot of CM work lately. I think that much of this is doable. In particular, load balancing and stuff is pretty easy. There is a paper in the upcoming SPAA conference that talks about an implementation of a sorting algorithm that does a kind of distributed load balancing. There is also come code floating around that implements quad trees, I believe, which is basically rescursive bisection, right? I don't know of any systems that do vectorization for the CM. In principle, it shouldn't be all that much different from vectorizing on a Cray (say) because the architectures are actually pretty similar. Just think of the CM as being a big, huge, wide vector register. Rather than one or two piped floating point units, the CM has up to 2048! This is the model that CM fortran uses to generate code... There are a couple of papers from Compass around that talk about automatically generating alignments form F90. See the 1990 frontiers conference and %A Kathleen Knobe %A Joan Lucas %A Guy Steele %T Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines %D 1990 %J Journal Of Parallel And Distributed Computing %V 8 %P 102-118 %K steele90 I'm not sure if the current CMF does any of this automatically, the docs don't mention it. This is really the heart of the whole problem of doing good compilers for machines where access to memory isn't uniform. This includes machines like the CM, the Cray (uniform access, like stride-1, is much faster), multiprocessors with cache, and NUMA MIMD machines. I think compilers can concentrate on alignment, but layout is a tougher problem. Let me know what you find out. > I think the code I have in mind was part of a ray tracer. Irregular disrtibutions aren't as bad as you might think, because the CM has facilities for per processor indirect addressing. Thus, there is a bit of flexibility with regard to how you access per processor data. Now, if the computations that you have to do are wildly irregular (i.e. not SPMD) then you have trouble. ########################################################################### From: nr1886@euclid.math.colostate.edu (Nenad Rijavec) Regarding your posting <1991May30.163634.17582@rice.edu>, my experience might be somewhat relevant, even though I am using C* as opposed to CM FORTRAN. I am implementing a multidimensional assignment problem solver on a CM. The problems are extremely sparse and the algorithms used are from combinatorial and nonsmooth optimization fields. My conclusion is that the CM is not particularly well suited for solving such problems, mainly because of the fact that array indices pretty much *have* to be serial. Sure, CM does allow parallel indices, but they are actually emulated in software and thus (at least in my case) usually prohibitevly expensive. What I do depends heavily on load balancing. My inquieries did not unearth anything useful, so I've had to work on a dynamical load balancing scheme myself. I don't know how good it really is (I am a mathematician and not a computer scientist), but I'll be glad to tell you about it if you are interested. -- MY SIGFILE HERE