eugene@pioneer.UUCP (02/25/87)
In article <1210@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes: >Distributed memory networks have been used for multi-user systems for several >years now - cf. Apollo networks. Some, at least, would argue they have been >used successfully. However, machines like the iPSC were designed to do heavy >computing, and NOT a lot of resource sharing. The cube manager is too much of >a bottleneck to be used as a resource server to the tower. > . . . >The hypercube is set up to do number crunching, with lots of operations per >byte of I/O. >-- >Doug Pase -- ...ucbvax!tektronix!ogcvax!pase or pase@Oregon-Grad Hypercubes were not designed to do a lot of heavy computing. You would be putting them into the Cray class of processor, and everyone's experience has been to the contrary. Heavy computing requires a well thought out (balanced) structure to prevent things like an I/O bottleneck. A hypercube is far from a typical end-user machine. The marketing hype which has surrounded hypercubes astounds me. It turns out the ONLY person I have heard a level-headed response from was Justin Rattner of Intel who stated that these machines are research machines to provide exposure to people on the problems of doing parallel programming. They are not designed to replace Crays or compete with them. To believe so would involve a great deal of misunderstanding. There are now five (if not more) companies selling hypercube architectures out there, I doubt if any will survive in the long term (in the hypercube market). Don't hold your breath for software either. Don't expect to take you dusty deck C or Fortran and have it automatically parallelize it (when it does, we will have achieved true AI 8-). From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene p.s. the only buzzword I didn't use was A*a.
miner@ulowell.UUCP (02/25/87)
In article <362@ames.UUCP> eugene@pioneer.UUCP (Eugene Miya N.) writes: >Don't expect to take you dusty deck C or Fortran and have it automatically >parallelize it (when it does, we will have achieved true AI 8-). >--eugene miya > NASA Ames Research Center The Alliant (a supermini parallel/vector machine made in MA) has a decent parallelizing FORTRAN compiler, and they are working on a C. Many of the companies coming out with parallel computer systems are doing parallelizing FORTRAN as the first language. Still not the best way to express things, but it allows you to port code. And even with compilers like Alliant's it is still good to go through the code and modify loops to remove dependencies between variables etc, it helps the compiler perform better. -- Rich Miner !ulowell!miner Cntr for Productivity Enhancement 617-452-5000x2693
pase@ogcvax.UUCP (02/26/87)
In article <1210@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes: . . . However, machines like the iPSC were designed to do heavy computing, and NOT a lot of resource sharing. . . . The hypercube is set up to do number crunching, with lots of operations per byte of I/O. In article <ames.362> eugene@pioneer.UUCP (Eugene Miya N.) responds: Hypercubes were not designed to do a lot of heavy computing. You would be putting them into the Cray class of processor, and everyone's experience has been to the contrary. Agreed, the iPSC is not intended to compete with the Cray - at ~1 Mflop per 32 node tower, it hasn't near the horsepower of a Cray. Crays are especially good at problems which are vectorizable and require huge amounts of memory. The iPSC is not. (However, just try to pick up a Cray for under $200,000.) (Miya) Heavy computing requires a well thought out (balanced) structure to prevent things like an I/O bottleneck. A hypercube is far from a typical end-user machine. Again I agree, but whether a machine architecture is "balanced" or not depends alot on its intended application. If a huge volume of communication is required, the iPSC is probably not appropriate. Geoffrey Fox of CalTech presented a paper in one of the 1984 conferences extolling some of the virtues of a hypercube architecture (NOTE: NOT an iPSC - the iPSC is based on Fox's design) for computing. The article was called "Concurrent Processing for Scientific Calculations". It was in an IEEE conference, but I don't remember which one. BTW, just about any new architecture is "far from a typical end-user machine." (Miya) The marketing hype which has surrounded hypercubes astounds me. It turns out the ONLY person I have heard a level-headed response from was Justin Rattner of Intel who stated that these machines are research machines to provide exposure to people on the problems of doing parallel programming. Perhaps you're too easily astounded, or maybe you think it's of no use because it's of no use to you... (Miya) They are not designed to replace Crays or compete with them. To believe so would involve a great deal of misunderstanding. Again, I don't disagree - it was never my contention. (Miya) There are now five (if not more) companies selling hypercube architectures out there, I doubt if any will survive in the long term (in the hypercube market). Don't hold your breath for software either. Don't expect to take you dusty deck C or Fortran and have it automatically parallelize it (when it does, we will have achieved true AI 8-). No question but that algorithms for the iPSC require a different approach than Von Neumann style machines; hence dusty decks won't work. This is no suprise to me, as there is a big difference between an MIMD architecture and a SISD architecture, and only a little difference between vector/scaler and SISD architectures. Does that mean they'll never succeed? Well, we'll see... One Last Word: I'm glad you subscribe to this newsgroup Eugene; I enjoy your postings. Please keep them coming. -- Doug Pase -- ...ucbvax!tektronix!ogcvax!pase or pase@Oregon-Grad
ron@brl-sem.UUCP (02/27/87)
In article <1216@ogcvax.UUCP>, pase@ogcvax.UUCP (Douglas M. Pase) writes: > Agreed, the iPSC is not intended to compete with the Cray - at ~1 Mflop per > 32 node tower, it hasn't near the horsepower of a Cray. Crays are especially > good at problems which are vectorizable and require huge amounts of memory. > The iPSC is not. (However, just try to pick up a Cray for under $200,000.) Granted ~1 MFlop per tower is not even rivaling a good mini these days. The nodes in the iPSC are entirely underwelming. This is not a condemnation of hypercubes in general, just the Intel one.
news@cit-vax.UUCP (02/27/87)
Organization : California Institute of Technology Keywords: From: jon@oddhack.Caltech.Edu (Jon Leech) Path: oddhack!jon In article <1216@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes: >required, the iPSC is probably not appropriate. Geoffrey Fox of CalTech >presented a paper in one of the 1984 conferences extolling some of the virtues >of a hypercube architecture (NOTE: NOT an iPSC - the iPSC is based on Fox's >design) for computing. This is a common misconception which I will attempt to correct. The original Caltech Cosmic Cubes (sitting not 20 feet from me), were put together by a team led by two professors - Fox & Chuck Seitz of CS - and a number of students from both Physics & CS. The two hypercube groups split up and have gone their separate ways since then, but please give credit where it's due, to Seitz. Fox might like to think he did it all by himself, but that's not the case. I do generally agree with Eugene Miya's assessment of hypercubes, though. I think it is a big mistake for people to attempt to do practical work using machines which are still very much research projects themselves (as I am finding in attempting to do my MS work on the cubes here). The biggest problems from my point of view are the terribly immature software environments (debugging? what's that?) and extremely slow communications to the cube hosts. -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon) Caltech Computer Science Graphics Group __@/
eugene@pioneer.UUCP (02/28/87)
In article <1881@cit-vax.Caltech.Edu> jon@oddhack.UUCP (Jon Leech) writes: >>Geoffrey Fox of CalTech >>presented a paper in one of the 1984 conferences extolling some of the virtues >>of a hypercube architecture (NOTE: NOT an iPSC - the iPSC is based on Fox's >>design) for computing. > > This is a common misconception which I will attempt to correct. >The original Caltech Cosmic Cubes (sitting not 20 feet from me), > -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon) > Caltech Computer Science Graphics Group As I mailed to the OGC, by chance I was at that meeting where Fox spoke. It was the IEEE COMPCON, and I won't forget it. It was in a different hotel that usual in SF because the usual hotel (Cathedral Hill where it runs as I write) had a fire. One week before the LA Times Article about hypercubes as possible future supercomputers. Fox (a physicist) got up before these EEs who had read this article. He was not well received. Geoffrey left the stage saying, "I'm not responsible for what people say about us." Not one of EE's brighter moments. Fox is basically a good guy (also known in the CS community as part of the Caltech SMP project). %A Geoffrey C. Fox %T Concurrent Processing for Scientific Calculations %J Digest of Papers COMPCON, Spring 84 %r Hm62 %I IEEE %D Feb. 1984 %P 70-73 %K Super scientific computers %X An introduction the the current 64 PE Caltech hypercube. Based on the dissertation by Lang (Caltech 1982) on the `Homogeneous machine.' Bart Locanthi also gets credit for the original Homogeneous Machine thesis (Caltech, 1980). Oh where is Eugene Brooks, III arguing for shared memory hypercubes when you need him? Sorry, I should summarize more and follow up less. --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
turner@uicsrd.UUCP (03/01/87)
() Written by miner@ulowell.cs.ulowell.edu in uicsrd:comp.arch
()In article <362@ames.UUCP> eugene@pioneer.UUCP (Eugene Miya N.) writes:
()>Don't expect to take you dusty deck C or Fortran and have it automatically
()>parallelize it (when it does, we will have achieved true AI 8-).
()>--eugene miya
()> NASA Ames Research Center
()
()The Alliant (a supermini parallel/vector machine made in MA) has a
()decent parallelizing FORTRAN compiler, and they are working on a C.
()Many of the companies coming out with parallel computer systems are
()doing parallelizing FORTRAN as the first language. Still not the best
()way to express things, but it allows you to port code. And even with
()compilers like Alliant's it is still good to go through the code and
()modify loops to remove dependencies between variables etc, it helps
()the compiler perform better.
()Rich Miner !ulowell!miner Cntr for Productivity Enhancement
There is a BIG difference between parallelizing for a machine like the
Alliant, with lots of nice loop-type synchronization built in (not to
mention a shared memory architecture), and automatic parallelization
for a HYPERCUBE. When, and if, we do achieve that it will involve far
more than the simple kinds of data dependence analysis that is the
basis for the || compilers that are beginning to appear commercially.
If the differences are not bloody obvious, mail me and I'll be more
than happy to share the troubles I've seen in writing for the iPSC,
compared to an FX/8 its a totally different ball game. Fun, but tough.
---------------------------------------------------------------------------
Steve Turner (on the Si prairie - UIUC CSRD)
UUCP: {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!turner
ARPANET: turner%uicsrd@a.cs.uiuc.edu
CSNET: turner%uicsrd@uiuc.csnet
BITNET: turner@uicsrd.csrd.uiuc.edu
fay@encore.UUCP (03/02/87)
In article <362@ames.UUCP> eugene@pioneer.UUCP (Eugene Miya N.) writes: > >Hypercubes were not designed to do a lot of heavy computing. You would >be putting them into the Cray class of processor, and everyone's >experience has been to the contrary. Heavy computing requires a well >thought out (balanced) structure to prevent things like an I/O bottleneck. > >The marketing hype which has surrounded hypercubes astounds me. >... > They are not designed to replace Crays or compete with >them. To believe so would involve a great deal of misunderstanding. >There are now five (if not more) companies selling hypercube >architectures out there, I doubt if any will survive in the long term >(in the hypercube market). I agree with your criticism of marketing hype, but not at all with your prognosis. At least one major oil company is buying large numbers of hypercubes (not Intel's) to do seismological work (replacing very expensive on-line time with their Crays). Some hypercubes have made very substantial improvements on Intel's design. Granted, one company using them for one narrow application doesn't make hypercubes the final word in computing, but neither are hypercubes doomed just because Intel made a mistake using ethernet chips to communicate between the nodes. That has problem has been ameliorated (though not solved). The cost benefits are incredible when one realizes that these programs actually get better turn-around time on a $12,000 hypercube than a batch-processing multi-million-dollar Cray. {linus,talcott,decvax,ihnp4,allegra,necis,compass}!encore!fay
ram@nucsrl.UUCP (03/04/87)
Eugene Wrote: >experience has been to the contrary. Heavy computing requires a well >thought out (balanced) structure to prevent things like an I/O bottleneck. >A hypercube is far from a typical end-user machine. Quite True: >The marketing hype which has surrounded hypercubes astounds me. It [Talk of marketing hype. The hype reminds of the AI field. "If you want to do serious work in AI you have to have a Symbolics or LMI or...". BS. I can do as well or better with a SUN. I have even heard interviewers telling me these. Do these guys start projects after reading the Ad pages of AI magazine?] Why doesn't anybody talk of any other network. I for one like the De Bruijn Interconnection network. Any flames/appreciation related to this? True hypercube is by far the best studied, easy VLSIable, extensible network. But is an interconnection network the be all and end all of multi-processors. No. It is far too early to judge that. Hypercubes/delta/banyans or whatever network you choose, I/O bandwidth, routing protocols, network connectivity would limit the number of problems that would run faster (relative to number of available processors) on these. I am not discounting cubes altogether (TMI guys have shown some interesting pieces of jugglery with cubes). I guess we can agree that problems that are communication bound are bound to have problems with any sort of network. Alternate Solution: Have enormous shared memory - Giga Giga Bytes. Here cache coherence is a major stumbling block. What classes of problems are more suitable for network based machines? If we assume that a problem is decomposed into a number of sub problems/ processes, problems that are embarassingly parallelizable and with little inter-process communication would be best for network class of machines. Heavy communication bound would be suited for Shared memory. As shared memory is not viable for more than single digit (optimistic) # of PEs what's the alternative? Solution: Mix these two within a framework (I think Cedar has such characteristics) so that a few PEs share a common Gigas of memory and such clusters are interconnected. It is wasteful to set up communication (broadcast is different issue altogether) to transfer from A to B for just a few KBs . In order to reduce the communication overheads with respect to the overall transfer time such a framework would be more suitable. This is probably what Eugene means as a balanced design. This has a few plus points. Fault-tolerance is improved, alongwith alternate communication channels. Another common misconception is about vectorization. Vectorization does not mean speed-ups for numerical calculations alone. Chaining, short-stopping, overlapping provide considerable speed-ups in the form of reduced memory access cycles. Till to-day only FORTRAN programmers had access to such machines (Probably the CRAYs were dedicated to this Saintly Sect.) and so the use. Although Vectorizing languages like 'C' is not as easy as FORTRAN, certainly vectorization is lot easier than auto-parallelization. [Gould had done some work (wonder what happened to it) to vectorizing C as well as Kuck & Associates]. I had done some research as part of a team in analysing a class of problems (ranging from bit manipulation, searching/sorting, tree manipulation to Fortran like number juggling). Disregarding the underlying architecture, the analyses were separated into parallelizable sections and vectorizable sections. Some problems were embarazzingly parallelizable and some had little amount of parallelization (A solution tree - a typical prolog search tree) but the amount of speed-up in vectorization is considerable in almost all problems. (No wonder there are so many vector CPU designs on the works today). If we build huge CPUs that crunch data at an alarming rate, communication latencies are going to limit their loading capabilities. If we build small CPUs that overlap communication with computation, effectively there is a speed-up (CM), but there is a limit to the CPU size and number which are dictated by the problems and interconnection complexity. Thus there is also a tradeoff between CPU power and interconnection type. In retrospect, choice of Intel chips for the caltech machine was probably not the best. Another problem for these network based multi-processors is the initial distribution of data and final collection. Almost everybody ignores them in the analysis, which I think is significant and have to be included. renu raman ....ihnp4!nucsrl!ram Northwestern Univ. Comp. Sci. Res. lab Evanston IL 60201 Thanks to Ollie, Iran has agreed to spend Zillions in supercomputer research. How philanthrophic of those guys. Why is that people who have used iPSC have either turned HYPER or are in a COSMIC trance :-)
ram@nucsrl.UUCP (03/07/87)
While talking about interconnection networks, I just recd my regular TI semiconductor newsletter. TI announced a 32 bit "shuffle exchange network" on a chip. Called AS8839, it can perform o Perfect Shuffle o Inverse Shuffle o Upper Broadcast o Lower Broadcast o Bit Exchange Anybody know of any other chip(s) for other network(s). renu raman ...ihnp4!nucsrl!ram Eugene: Its time you changed your ".signature".
ram@nucsrl.UUCP (03/14/87)
Fay wrote: >The cost benefits are incredible when one realizes that these programs >actually get better turn-around time on a $12,000 hypercube than a ^ +- where did a '0' go. Last time I heard (yesterday somebody at Argonne told me) it was 125000 for a d4 machine (d-dimension & 4 is 4). That price is without the vector processors in them. Could somebody from Intel Clarify. >batch-processing multi-million-dollar Cray. Watch out for cray discounts. Wait(not too long) for the competition to build and see the prices tumbling. >{linus,talcott,decvax,ihnp4,allegra,necis,compass}!encore!fay >----------
fay@encore.UUCP (Peter Fay) (03/18/87)
In article <3810018@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes: > > Fay wrote: > >>The cost benefits are incredible when one realizes that these programs >>actually get better turn-around time on a $12,000 hypercube than a > ^ > +- where did a '0' go. > Last time I heard (yesterday somebody at Argonne told me) it was > 125000 for a d4 machine (d-dimension & 4 is 4). That price is without > the vector processors in them. Could somebody from Intel Clarify. I wasn't refering to Intel's hypercube. In fact I beleive I said Intel's wasn't the greatest implementation. My rough pricing came from a cube configuration from Ncube Corp.'s boards for a total of 16 nodes. (The exact $ price is from memory - it may have gone down). My only 'real' comparison of cubes was at the ICPP conference last summer, viewing both the Intel and Ncube running the same Mandelbrot program (what else? ). The Ncube was (very roughly) ten times faster. The Intel people explained this by saying their machine was still 'experimental', while Ncube's was a commercial product. Maybe that's why Ncube's is being used in commercial applications. - peter fay {linus,talcott,decvax,ihnp4,allegra,necis,compass}!encore!fay
eugene@pioneer.arpa (Eugene Miya N.) (03/18/87)
In article <1150@encore.UUCP> fay@encore.UUCP (Peter Fay) writes: >In article <3810018@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes: >> >> Fay wrote: >> >>>The cost benefits are incredible when one realizes that these programs >>>actually get better turn-around time on a $12,000 hypercube than a >> ^ >> +- where did a '0' go. > >I wasn't refering to Intel's hypercube. I thought you were refering to a 4 node cube. In fact I saw John Palmer's 4 node system when he brought it last year to COMPCON. Speedups of 4 over micro processors, even 16 over micros are dumb. You want speed ups of hundreds of times to compete with larger mainframes class CPUs which have much faster busses for I/O. >My only 'real' comparison of cubes was at the ICPP conference >last summer, viewing both the Intel and Ncube running the same >Mandelbrot program (what else? ). The Ncube was (very roughly) ten times >faster. The Intel people explained this by saying their machine was >still 'experimental', while Ncube's was a commercial product. > >Maybe that's why Ncube's is being used in commercial applications. > - peter fay What "commercial" application is running on a Cube? Any? I am under the send mail when answering this, don't clutter the net impression (from the Cube conferences) that we are all dealing with reduced (read toy) problems. I don't know of a scientist anywhere running "production code," including Caltech. Code development sure and other experiments. That's quite an investment of a scientists time to write something which does not have a guarantee that a line of machines is going to continue (like writing HEP applications). From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
news@cit-vax.Caltech.Edu (Usenet netnews) (03/19/87)
Organization : California Institute of Technology Keywords: From: jon@oddhack.Caltech.Edu (Jon Leech) Path: oddhack!jon In article <1150@encore.UUCP> fay@encore.UUCP (Peter Fay) writes: >My only 'real' comparison of cubes was at the ICPP conference >last summer, viewing both the Intel and Ncube running the same >Mandelbrot program (what else? ). The Ncube was (very roughly) ten times >faster. The Intel people explained this by saying their machine was >still 'experimental', while Ncube's was a commercial product. > >Maybe that's why Ncube's is being used in commercial applications. Based on talking to an NCUBE salesman at the Oak Ridge Hypercube Conference last September, you can't get enough memory on one of their nodes (128K, I think) to do the sorts of things I want. Does anyone know if this will change? 4 Mb/node seems like a reasonable number to me. Other than this major problem I was very impressed by the NCUBE. Obviously some people can get good use out of them. But commerical applications != Mandelbrot sets! -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon) Caltech Computer Science Graphics Group __@/
neighorn@qiclab.UUCP (Steven C. Neighorn) (05/11/87)
In article <3810015@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes:
: While talking about interconnection networks, I just recd my
:regular TI semiconductor newsletter. TI announced a 32 bit "shuffle
:exchange network" on a chip. Called AS8839, it can perform
:
: o Perfect Shuffle
: o Inverse Shuffle
: o Upper Broadcast
: o Lower Broadcast
: o Bit Exchange
:
: Anybody know of any other chip(s) for other network(s).
: renu raman
I believe shuffle-exchange functions were implemented/discussed by D. H.
Lawrie in the paper "Access and Alignment of Data in an Array Processor,"
IEEE Transactions on Comp., C-24, no. 12, Dec 1975, pages 1145-1155. The
particular implementation the shuffle/exchange functions were used in
was a multistage Omega network.
---
Steven C. Neighorn tektronix!{psu-cs,reed}!qiclab!neighorn
Portland Public Schools "Where we train young Star Fighters to defend the
(503) 249-2000 ext 337 frontier against Xur and the Ko-dan Armada"