[net.arch] Transputer and occam

kevin@cornell.UUCP (Kevin Karplus) (03/22/85)

Occam is not a completely idiosyncratic language.  It is an
implementation of CSP (Communicating Sequential Processes), which has
been used by theorists for years as a simple, concise notation for
discussing concurrent programs.

Occam is clearly not a programming language to end all programming
languages, but it provides a simple way to talk about communication
between processes (somthing NOT provided by C, PASCAL, FORTRAN, PL/I,
...).

I've not programmed in Occam, have been considering it seriously.
Occam is available cheap to universities (call inMOS for details), and
runs on VAXes as well as Transputers.

The transputer chip looks ideal for systolic array designs.  It lacks
enough channels for a hypercube, but as other people have pointed out,
it isn't clear that a small processor can handle lots of channels.
The transputer does not look very good as a part of a "pseudo-node" of
a hypercube (each chip has one edge of the cube, and shares a bus with
the other processors at the same node), since it does not have a
shared-bus communication port.

I'm a little dubious about the value of hypercubes, as most big
programs have a 5% to 20% purely serial component.  Even doing the
parallel part in zero time is only a 5 to 20 times speedup.  Any
machine that hopes for bigger speedups must have at least one extermely
fast serial processor.  (Note: this only applies to general-purpose
machines.  Obviously, certain problems can have the serial part reduced
to a tiny fraction.)

I like the idea of including a disk drive at every node of a cube.  It
is incredibly expensive, but many disk drives with a processor for each
is the only way to get fast databases.  I'm not convinced a cube is the
right architecture, though.  There are networks with fewer links that
are amost as fast, and much cheaper to build.

				Kevin Karplus

[Standard disclaimers]

gottlieb@cmcl2.UUCP (Allan Gottlieb) (03/27/85)

In article <440@cornell.UUCP> kevin@gvax.UUCP (Kevin Karplus) writes:
>I'm a little dubious about the value of hypercubes, as most big
>programs have a 5% to 20% purely serial component.
>(Note: this only applies to general-purpose
>machines.  Obviously, certain problems can have the serial part reduced
>to a tiny fraction.)

Have you any data to support this claim?
At NYU, our Ultracomputer project has (very extensive) experience with
a wide range of important scientific applications and we have NEVER
found the serial code to be k% for fixed k.  Instead each of these problems
is invariably a class of problems parameterized by some "size"
variables (often the number of mesh points) and the serial portion
approaches 0 as the size increases.  Thus, for large enough problems
the potential for parallelism can be made arbitrarily large.

This raises the question of "how much is enough".  That is how big
must a problem be for 1000 processors to be used effectively.  We have
numerious simulation results on this question.  The NASA (GISS)
"weather code" (i.e. three dimensional atmospheric simulation) when 
executed using meshes appropriate for an Amdahl V7 or V8 can get high
efficiency (above 70%) with a few hundred processors but not
thousands.  However, when (more desirable from a numerical analysis
point of view) meshes separated by about 1 degree of arc are used
thousands of processors can be efficiently employed.  Thus for this
problem class, thousands (but not millions) of processors would be
useful.  I should note that we parallelized this program without
excessive effort using techniques that Kuck (Illinois) and Kennedy
(Rice) and their collegeues believe can be done automatically.
Perhaps using more sophisticated parallelization techniques or by
employing a new algorithm, more processors could be used.  I do not
believe that just refining the mesh enough to utilize a million
processors is justified from a numerical analysis point of view -- but
here I am on shakey grounds.

Caltech has also reported on many scientific problems (using their
real hardware) and again the serial portion drops with problem size.
-- 
Allan Gottlieb
GOTTLIEB@NYU
{floyd,ihnp4}!cmcl2!gottlieb   <---the character before the 2 is an el

henry@utzoo.UUCP (Henry Spencer) (04/02/85)

> >I'm a little dubious about the value of hypercubes, as most big
> >programs have a 5% to 20% purely serial component.
> >(Note: this only applies to general-purpose
> >machines.  Obviously, certain problems can have the serial part reduced
> >to a tiny fraction.)
> 
> Have you any data to support this claim?

There is one very large piece of supporting data:  CDC's incredibly
expensive failure, the Star-100 supercomputer, which ran vectors very
quickly and scalars very slowly.  They had hoped for automatic vectorizers
that would be good enough to make the thing work, and had also hoped that
most problems would vectorize simply and almost completely.  No such luck.
One of the reasons the Cray-1 was such a success, by comparison, was that
the Cray does scalar operations at blazing speed too.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

gottlieb@cmcl2.UUCP (Allan Gottlieb) (04/04/85)

>> >I'm a little dubious about the value of hypercubes, as most big
>> >programs have a 5% to 20% purely serial component.
>> >(Note: this only applies to general-purpose
>> >machines.  Obviously, certain problems can have the serial part reduced
>> >to a tiny fraction.)
>> 
>> Have you any data to support this claim?
>
>There is one very large piece of supporting data:  CDC's incredibly
>expensive failure, the Star-100 supercomputer, which ran vectors very
>quickly and scalars very slowly.  

Vectorizable does not equal parallelizable.  We have good
evidence that the percentage of inheriently serial code goes down with
problem size for a number of important applications.  There 
still remains the question of whether sufficient high cache hit ratios
will occur to keep the processor-memory network from being saturated.
Also I/O is a serious question and there are others.  But I repeat,
our simulation data seem to clearly show that there is not this
absolute limit on speedup caused by a fixed percentage of inheriently
serial code (or at least any such limit is VERY high thousand-fold
speedup is definitely possible).
-- 
Allan Gottlieb
GOTTLIEB@NYU
{floyd,ihnp4}!cmcl2!gottlieb   <---the character before the 2 is an el

brooks@lll-crg.ARPA (Eugene D. Brooks III) (04/04/85)

> > >I'm a little dubious about the value of hypercubes, as most big
> > >programs have a 5% to 20% purely serial component.
> > >(Note: this only applies to general-purpose
> > >machines.  Obviously, certain problems can have the serial part reduced
> > >to a tiny fraction.)
> > 
> > Have you any data to support this claim?
> 
> There is one very large piece of supporting data:  CDC's incredibly
> expensive failure, the Star-100 supercomputer, which ran vectors very
> quickly and scalars very slowly.  They had hoped for automatic vectorizers
> that would be good enough to make the thing work, and had also hoped that
> most problems would vectorize simply and almost completely.  No such luck.
> One of the reasons the Cray-1 was such a success, by comparison, was that
> the Cray does scalar operations at blazing speed too.
> -- 
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry

One should be careful about claiming that the failure a machine with a high
penalty for scalar processing vs vector processing transfers to MIMD machines
in general.  Vectorization is a form of parallelism but is a rather restricted
and highly regular form.  An application may have parallelism that can't be
exploited with vectorization.  The most attractive parallel processor
architectures allow the exploitation of both forms.  The idea is to have many
cpus sharing memory and give each cpu has vector instructions.