segall%clash.rutgers.edu@RELAY.CS.NET (Ed Segall) (01/07/88)
One of the strong selling points (or claims, anyway) for many 'parallel' computers, especially shared memory multiprocessors and vector processors with vectorizing compilers, is that these are much easier to program than distributed memory multiprocessors. I have observed the degree of difficulty that experienced sequential programmers have in dealing with a hypercube-based distributed-memory computer. I would like to know if that is a universal experience, and if not, what distributed-memory systems are easier to program (especially message-passing architectures), and what makes them easier. The reason for this query is that I am studying the programming environment for such machines, and I'm trying to come up with ways to improve it. I have a few ideas, and I'd like to see how they compare with the experiences of others. Any information you may have on systems, languages, hardware or environments which makes a significant difference to the development process would be useful. I am especially interested in your feelings as to what _issues_ are most relevant, though even info such as "Machine Y in language X is awful; machine a in language B is much easier to develop (and debug) on" is somewhat interesting. I realize that there are a few tradeoffs here - the newest hardware is unlikely to have the most bug-free software, for example. So, obviously software environments that have fewer bugs will be easier to use. Certainly, I am looking for information about this tradeoff, but I'm much more interested in others. For example, are there some environments which are easy to use but don't produce the most efficient programs? I am aware that there are a few levels of OS support for some of the Caltech cubes, giving varying degrees of speed v. difficulty in coding. This is closer to my goal, though I am also interested in experience with even higher-level development support. I will gladly summarize information I recieve to the net. Perhaps this will open up an interesting discussion. (More importantly, perhaps it will help to get some user-friendly software written.) Thank you, Ed Segall Please reply by email, as this is a moderated newsgroup, and I am interested in as many responses as possible. -- uucp: ...{harvard, ut-sally, sri-iu, ihnp4!packard}!topaz!caip!segall arpa: SEGALL@CAIP.RUTGERS.EDU
brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) (02/11/88)
In article <832@hubcap.UUCP> segall%clash.rutgers.edu@RELAY.CS.NET (Ed Segall) writes: >One of the strong selling points (or claims, anyway) for many 'parallel' >computers, especially shared memory multiprocessors and vector >processors with vectorizing compilers, is that these are much easier >to program than distributed memory multiprocessors. Take if from someone with a lot of experience with hypercubes, shared memory multiprocessors, and vector processors. Shared memory multiprocessors are much easier to program, with the right language support you can have common source code between shared memory multiprocessors and serial machines, and be efficient on both. The program architecture required for a distributed system is VERY DIFFERENT than that of a serial program, and you can't slowly evolve a serial program into a parallel program for a distributed architecture, as you can for a shared memory machine. In terms of programming ease the shared memory machine, with a uniform and finely interleaved shared memory, is the hands down winner. Eugene Brooks ------------------------------------------------------------------- P.S. Before you hypercube types get your backs up, check the history
johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) (02/24/88)
In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes: >The program architecture required for a distributed system is VERY >DIFFERENT than that of a serial program, and you can't slowly evolve >a serial program into a parallel program for a distributed architecture, >as you can for a shared memory machine. > > Eugene Brooks >------------------------------------------------------------------- I disagree that programs have to be different. The program architecture must be different on sequential and distributed memory machines. Many of the problems one encounters are due to a flawed model for using the so-called "host" processor on the commercial hypercubes. There is a large and growing body of hypercube software that runs UNCHANGED on sequential machines. For a fuller description than I feel like typing in at the moment, see "Cubix: Programming Hypercubes without Programming Hosts," in the Proceedings of the 1987 Knoxville Hypercube Conference or send me email. John Salmon
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (02/25/88)
In article <1010@hubcap.UUCP> elroy!johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) writes: > >In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes: >>The program architecture required for a distributed system is VERY >>DIFFERENT than that of a serial program, and you can't slowly evolve >>a serial program into a parallel program for a distributed architecture, >>as you can for a shared memory machine. >> >> Eugene Brooks >>------------------------------------------------------------------- > >I disagree that programs have to be different. >The program architecture must be different on sequential >and distributed memory machines. You simply reiterate my point after disagreing, but given that your response seems to infer thay my main point that shared memory machines are easier to program than distributed memory machines and easier to extract good performance from as well, I propose a test. I have a shared memory parallel program which simulates a packet switched network. This program is written in PCP, an explicitly parallel extension of C for shared memory machines. The parallel program was created from its serial starting point in one week. The parallel implementation also runs on a uniprocessor with all of the parallel runtime overhead stripped out. I propose the following. 1) That you port the program to the hypercube. I do not doubt that it is possible, I know and love hypercubes and have had something to do with the "hype" the surrounds them, but I would like to know just how long it will take you to create the distributed memory version of this program. Your distributed memory version must produce results that are identical to the serial version of the program. It may not be a "subset" of what the shared memory version does, ie you can not simulated a "limited" class of networks. I do not care what the hypercube code looks like, only that the program deliver the same outputs given the same inputs as the serial and shared memory parallel version of the code. 2) That you measure the speedups and more importantly the ABSOLUTE performance obtained with the hypercube version. I will be interested in just how large the problem size must be made for a given number of processors before good efficiency is obtained, and would like to compare this to the shared memory version. 3) I would then like to you to run the "hypercubeized" version of the code on a serial machine using the same processor technology as the hypercube you run on, as you claim is possible, and measure the extra overhead (My favorite catch phrase these days is "Compiler, algorithm and architectural inefficiency is 100% parallelizable") so that we can get a feel for how "inefficient" the message passing version of the code is. Are you up to the Brooks challenge? Is anyone out there who is "hyping" hypercubes these days willing to accept the "Brooks challenge" and report the results in the open literature?