segall%clash.rutgers.edu@RELAY.CS.NET (Ed Segall) (01/07/88)
One of the strong selling points (or claims, anyway) for many 'parallel'
computers, especially shared memory multiprocessors and vector
processors with vectorizing compilers, is that these are much easier
to program than distributed memory multiprocessors.
I have observed the degree of difficulty that experienced sequential
programmers have in dealing with a hypercube-based distributed-memory
computer. I would like to know if that is a universal experience, and
if not, what distributed-memory systems are easier to program
(especially message-passing architectures), and what makes them
easier.
The reason for this query is that I am studying the programming
environment for such machines, and I'm trying to come up with ways to
improve it. I have a few ideas, and I'd like to see how they compare
with the experiences of others.
Any information you may have on systems, languages, hardware or
environments which makes a significant difference to the development
process would be useful. I am especially interested in your feelings
as to what _issues_ are most relevant, though even info such as "Machine
Y in language X is awful; machine a in language B is much easier to
develop (and debug) on" is somewhat interesting.
I realize that there are a few tradeoffs here - the newest hardware is
unlikely to have the most bug-free software, for example. So,
obviously software environments that have fewer bugs will be easier to
use. Certainly, I am looking for information about this tradeoff, but
I'm much more interested in others. For example, are there some
environments which are easy to use but don't produce the most
efficient programs? I am aware that there are a few levels of OS
support for some of the Caltech cubes, giving varying degrees of speed
v. difficulty in coding. This is closer to my goal, though I am also
interested in experience with even higher-level development support.
I will gladly summarize information I recieve to the net. Perhaps
this will open up an interesting discussion. (More importantly,
perhaps it will help to get some user-friendly software written.)
Thank you,
Ed Segall
Please reply by email, as this is a moderated newsgroup, and I am
interested in as many responses as possible.
--
uucp: ...{harvard, ut-sally, sri-iu, ihnp4!packard}!topaz!caip!segall
arpa: SEGALL@CAIP.RUTGERS.EDUbrooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) (02/11/88)
In article <832@hubcap.UUCP> segall%clash.rutgers.edu@RELAY.CS.NET (Ed Segall) writes: >One of the strong selling points (or claims, anyway) for many 'parallel' >computers, especially shared memory multiprocessors and vector >processors with vectorizing compilers, is that these are much easier >to program than distributed memory multiprocessors. Take if from someone with a lot of experience with hypercubes, shared memory multiprocessors, and vector processors. Shared memory multiprocessors are much easier to program, with the right language support you can have common source code between shared memory multiprocessors and serial machines, and be efficient on both. The program architecture required for a distributed system is VERY DIFFERENT than that of a serial program, and you can't slowly evolve a serial program into a parallel program for a distributed architecture, as you can for a shared memory machine. In terms of programming ease the shared memory machine, with a uniform and finely interleaved shared memory, is the hands down winner. Eugene Brooks ------------------------------------------------------------------- P.S. Before you hypercube types get your backs up, check the history
johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) (02/24/88)
In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes: >The program architecture required for a distributed system is VERY >DIFFERENT than that of a serial program, and you can't slowly evolve >a serial program into a parallel program for a distributed architecture, >as you can for a shared memory machine. > > Eugene Brooks >------------------------------------------------------------------- I disagree that programs have to be different. The program architecture must be different on sequential and distributed memory machines. Many of the problems one encounters are due to a flawed model for using the so-called "host" processor on the commercial hypercubes. There is a large and growing body of hypercube software that runs UNCHANGED on sequential machines. For a fuller description than I feel like typing in at the moment, see "Cubix: Programming Hypercubes without Programming Hosts," in the Proceedings of the 1987 Knoxville Hypercube Conference or send me email. John Salmon
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (02/25/88)
In article <1010@hubcap.UUCP> elroy!johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) writes: > >In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes: >>The program architecture required for a distributed system is VERY >>DIFFERENT than that of a serial program, and you can't slowly evolve >>a serial program into a parallel program for a distributed architecture, >>as you can for a shared memory machine. >> >> Eugene Brooks >>------------------------------------------------------------------- > >I disagree that programs have to be different. >The program architecture must be different on sequential >and distributed memory machines. You simply reiterate my point after disagreing, but given that your response seems to infer thay my main point that shared memory machines are easier to program than distributed memory machines and easier to extract good performance from as well, I propose a test. I have a shared memory parallel program which simulates a packet switched network. This program is written in PCP, an explicitly parallel extension of C for shared memory machines. The parallel program was created from its serial starting point in one week. The parallel implementation also runs on a uniprocessor with all of the parallel runtime overhead stripped out. I propose the following. 1) That you port the program to the hypercube. I do not doubt that it is possible, I know and love hypercubes and have had something to do with the "hype" the surrounds them, but I would like to know just how long it will take you to create the distributed memory version of this program. Your distributed memory version must produce results that are identical to the serial version of the program. It may not be a "subset" of what the shared memory version does, ie you can not simulated a "limited" class of networks. I do not care what the hypercube code looks like, only that the program deliver the same outputs given the same inputs as the serial and shared memory parallel version of the code. 2) That you measure the speedups and more importantly the ABSOLUTE performance obtained with the hypercube version. I will be interested in just how large the problem size must be made for a given number of processors before good efficiency is obtained, and would like to compare this to the shared memory version. 3) I would then like to you to run the "hypercubeized" version of the code on a serial machine using the same processor technology as the hypercube you run on, as you claim is possible, and measure the extra overhead (My favorite catch phrase these days is "Compiler, algorithm and architectural inefficiency is 100% parallelizable") so that we can get a feel for how "inefficient" the message passing version of the code is. Are you up to the Brooks challenge? Is anyone out there who is "hyping" hypercubes these days willing to accept the "Brooks challenge" and report the results in the open literature?