[comp.hypercube] Difficulty of programming in parallel

segall%clash.rutgers.edu@RELAY.CS.NET (Ed Segall) (01/07/88)

One of the strong selling points (or claims, anyway) for many 'parallel'
computers, especially shared memory multiprocessors and vector
processors with vectorizing compilers, is that these are much easier
to program than distributed memory multiprocessors.

I have observed the degree of difficulty that experienced sequential
programmers have in dealing with a hypercube-based distributed-memory
computer.  I would like to know if that is a universal experience, and
if not, what distributed-memory systems are easier to program
(especially message-passing architectures), and what makes them
easier.

The reason for this query is that I am studying the programming
environment for such machines, and I'm trying to come up with ways to
improve it. I have a few ideas, and I'd like to see how they compare
with the experiences of others.

Any information you may have on systems, languages, hardware or
environments which makes a significant difference to the development
process would be useful.  I am especially interested in your feelings
as to what _issues_ are most relevant, though even info such as "Machine
Y in language X is awful; machine a in language B is much easier to
develop (and debug) on" is somewhat interesting.


I realize that there are a few tradeoffs here - the newest hardware is
unlikely to have the most bug-free software, for example.  So,
obviously software environments that have fewer bugs will be easier to
use.  Certainly, I am looking for information about this tradeoff, but
I'm much more interested in others.  For example, are there some
environments which are easy to use but don't produce the most
efficient programs?  I am aware that there are a few levels of OS
support for some of the Caltech cubes, giving varying degrees of speed
v. difficulty in coding.  This is closer to my goal, though I am also
interested in experience with even higher-level development support.


I will gladly summarize information I recieve to the net.  Perhaps
this will open up an interesting discussion.  (More importantly,
perhaps it will help to get some user-friendly software written.)


Thank you,


Ed Segall


Please reply by email, as this is a moderated newsgroup, and I am
interested in as many responses as possible.
-- 

uucp:   ...{harvard, ut-sally, sri-iu, ihnp4!packard}!topaz!caip!segall
arpa:   SEGALL@CAIP.RUTGERS.EDU

brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) (02/11/88)

In article <832@hubcap.UUCP> segall%clash.rutgers.edu@RELAY.CS.NET (Ed Segall) writes:
>One of the strong selling points (or claims, anyway) for many 'parallel'
>computers, especially shared memory multiprocessors and vector
>processors with vectorizing compilers, is that these are much easier
>to program than distributed memory multiprocessors.

Take if from someone with a lot of experience with hypercubes,
shared memory multiprocessors, and vector processors.  Shared
memory multiprocessors are much easier to program, with the right
language support you can have common source code between shared
memory multiprocessors and serial machines, and be efficient on both.
The program architecture required for a distributed system is VERY
DIFFERENT than that of a serial program, and you can't slowly evolve
a serial program into a parallel program for a distributed architecture,
as you can for a shared memory machine.  In terms of programming ease
the shared memory machine, with a uniform and finely interleaved
shared memory, is the hands down winner.

						Eugene Brooks
-------------------------------------------------------------------
P.S.  Before you hypercube types get your backs up, check the history

johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) (02/24/88)

In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes:
>The program architecture required for a distributed system is VERY
>DIFFERENT than that of a serial program, and you can't slowly evolve
>a serial program into a parallel program for a distributed architecture,
>as you can for a shared memory machine.
>
>						Eugene Brooks
>-------------------------------------------------------------------

I disagree that programs have to be different.
The program architecture must be different on sequential
and distributed memory machines.  Many of the problems
one encounters are due to a flawed model for using the so-called
"host" processor on the commercial hypercubes.  
There is a large and growing body of hypercube software that
runs UNCHANGED on sequential machines.  For a fuller description
than I feel like typing in at the moment, see "Cubix: Programming
Hypercubes without Programming Hosts," in the Proceedings of the
1987 Knoxville Hypercube Conference or send me email.

John Salmon

brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (02/25/88)

In article <1010@hubcap.UUCP> elroy!johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) writes:
>
>In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes:
>>The program architecture required for a distributed system is VERY
>>DIFFERENT than that of a serial program, and you can't slowly evolve
>>a serial program into a parallel program for a distributed architecture,
>>as you can for a shared memory machine.
>>
>>						Eugene Brooks
>>-------------------------------------------------------------------
>
>I disagree that programs have to be different.
>The program architecture must be different on sequential
>and distributed memory machines.
You simply reiterate my point after disagreing, but given that your
response seems to infer thay my main point that shared memory machines
are easier to program than distributed memory machines and easier to
extract good performance from as well, I propose a test.  I have a
shared memory parallel program which simulates a packet switched network.
This program is written in PCP, an explicitly parallel extension of C for
shared memory machines.  The parallel program was created from its serial
starting point in one week.  The parallel implementation also runs on a
uniprocessor with all of the parallel runtime overhead stripped out.

I propose the following.
	1) That you port the program to the hypercube.  I do not doubt
	that it is possible, I know and love hypercubes and have had
	something to do with the "hype" the surrounds them, but I would
	like to know just how long it will take you to create the distributed
	memory version of this program.  Your distributed memory version must
	produce results that are identical to the serial version of the program.
	It may not be a "subset" of what the shared memory version does, ie you
	can not simulated a "limited" class of networks.  I do not care what
	the hypercube code looks like, only that the program deliver the same
	outputs given the same inputs as the serial and shared memory parallel
	version of the code.
	
	2) That you measure the speedups and more importantly the ABSOLUTE
	performance obtained with the hypercube version.  I will be interested
	in just how large the problem size must be made for a given number
	of processors before good efficiency is obtained, and would like to
	compare this to the shared memory version.
	
	3) I would then like to you to run the "hypercubeized" version of the
	code on a serial machine using the same processor technology as the
	hypercube you run on, as you claim is possible, and measure the
	extra overhead (My favorite catch phrase these days is "Compiler,
	algorithm and architectural inefficiency is 100% parallelizable")
	so that we can get a feel for how "inefficient" the message passing
	version of the code is.
	
Are you up to the Brooks challenge?  Is anyone out there who is "hyping"
hypercubes these days willing to accept the "Brooks challenge" and report
the results in the open literature?