ds@hollin.prime.com (10/25/89)
/* Written 5:06 pm Oct 24, 1989 by hsu@uicsrd.csrd.uiuc.edu in comp.arch */ In article <6655@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: > >Also, the job of programming a shared memory system is a lot harder than >programming a system with messages as the communication medium. Umm, have you tried? When I took a parallel programming class a couple years ago, everybody griped about the problems of coding Gaussian elimination for the hypercube. Hardly anybody griped about implementing it for the Sequent balance. (This single example of course doesn't prove anything, but I would like to see some examples of problems that are clearly easier to code for a message-passing system.) Bill /* End of text from hollin:comp.arch */ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An example of a problem that is clearly easier to code in a message-passing system than in a shared-memory MIMD system like the Sequent is ordinary array (matrix) cross-product. The Sequent offers no particular support, according to its programming literature, other than doing unrelated cross-products simultaneously on different 386s. However, true data-parallel computers (defined as computers that are powerful enough to simulate a systolic 2-d grid efficiently) can implement the cross-product using an elegant linear-time algorithm (measured using the word, not bit, model) that simply feeds each array element into adjacent grid edges, one array into the rows and the other into the columns, with a delay of one time unit for each succeeding row (and the same for the columns). Each processor does an add and a multiply. Let me know if you want to know the details; you can work it out for yourself easily by examining the terms of a written-out symbolic cross-product. (Furthermore, this algorithm can also "pipeline" unrelated cross-products to reduce the multiplicative overhead to close to 1, although this is a tiny effect compared with the improvement from quadratic time to linear). David Spector Prime Computer, Inc. ds@primerd.prime.com (until the layoff, soon) (The above reflects my very scanty knowledge about parallel programming. Please let me know if I've misunderstood something, and thanks...)
kale@m.cs.uiuc.edu (10/26/89)
According to NCUBE glossies in front of me: "512 billion bytes of memory capacity" So up to 64 Mbytes/processor.. about price: "... at less than $1000/MFLOP and $1000/Mbyte" and elsewhere: ".. varying number of processors - from 16 to 8192 - that deliver from 8 to 27,000 scalar MFLOPS" (So, price can be deduced.. But I believe it will be highly non-linear..) On another point: Yes it is more difficult (but not too difficult) to program distributed memory machines. But the rewards are great, and the challenge is stimulating. I believe they will be useful for a large fraction of applications. When people claim that "they are good for some applications, but not most", they sometimes may mean a particular algorithm or a particular way of doing things may be difficult on message passing machines. But not the end application - it can often be reformulated so it works fine on them.