[mod.ai] Fed up with all this `talk' about parallelism

eugene@AMES-AURORA.ARPA (Eugene miya) (06/20/86)
The following are some ideas I have been thinking about with the help
of one co-worker.  I plan to post this to several groups where I
regard parallelism discussions are significant such as parsym, ailist,
and net.arch.  The ideas are still in formation.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  {hplabs,hao,dual,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene


draft:
The Mythical MIPS (MegaFLOPS)
(pardons to Fred Brooks)

"Introduction"

That's it!  I'm tired of hearing about all this parallelism out there.
Too many people are talking about parallelism without truly understanding
what it is.  There appear to be conceptual as well as syntactic and
semantic problems.

One problem is that the world is not always parallel.  "Thinking" is
not always parallel: dependency "chains" are created in logic for instance.
Another problem is that we think much of the world is parallel,
but some "levels" of parallelism are not interchangeable.  It appears
there are serially connected parallel processes with serial bottlenecks
between processes (not necessary Petri nets).

Looking at snapshots,
	<Blind men ("including" me) trying to describe the elephant>
I see two communities who are not communicating:
physical scientists see "spatial" parallelism: all those difference
equations over a given space, they see meshes, but the computer science people
(typically the AI and compiler people) see "syntactic" parallelism,
they tend to see syntax trees like data flow graphs, for instance.
[I do note that rectangular meshes turned on their corners do represent
`trees.']

"The Concept"

Parallelism is a geometric concept: lines not meeting and equidistant (well..).
Parallelism is not a given.  `Dependence' prevents `decomposition.'
From Fred Brooks:
If it takes a female 9 months to have offspring, then nine females can
have one in one month.  If a computation takes 9 units of time,
then . . .  Are the units interchangeable or should we make a distinction
in unit type?  Are we confusing work and effort?

"Terminology"

Consider the terminology parallelism, concurrency, multiprocessing,
multitasking (this one is really loaded), nonsequential (non-von), etc.
There is a lot of different terminology to describe parallelism.
I don't think it's necessary to standardize the terminology, but
perhaps we should?  For instance:

	Would you place a "tightly-coupled problem" on a
	"loosely-coupled" multiprocessor?

First obvious question is "what's a `tightly coupled problem?'"
How do you measure the parallelism?  Is it just the count of the number
of parallel execution streams?
A problem of parallelism is just the degree of decompositibility:
even in uniprocessor computer systems, there is such a degree of
asynchronous inefficiency, that CPUs wait, that work is really distributed
all over the place.

Let's change the terminology for a moment to try and better understand
the issues.  Rather than use parallel and multiprocess (or concurrent)
Let's try "cooperative" and "coordinated" like we would take regions
around a point, we might be able to study the neighborhood around the word
`parallel.' Is there a difference between the two. Diane Smith
asserts there is.  I think there may be.

Cooperative computing implies working together to achieve a single goal.
Coordinated computing implies more that processes don't bump heads
(separate goals) but work in a common environment (coordinate).
There is the obvious third factor of communications.  There may also be
levels and different types of communications such as control interaction
versus bulk transfer.  Better adjectives might exist, perhaps changing
words do better, but history effects will bias those of us working
on this.

"Classifications of parallelism"

There are an obscene number of classifications:
Flynn's system: SISD, SIMD, MIMD...
Kuck's modification: execution streams distinct from instruction
streams: SIME(MD), MIME(MD), etc.
Handler's commentary that there were lots of commentaries and little work
Prolog et al AND-parallelism and OR-parallelism
Then there is temporal parallelism: pipelining: parallelism, but different

Parallelism is all cars starting forward the moment the light turns
green (without regard for any cars head).  Pipelining is waiting
for the car ahead of you to start rolling.

I also see three conditions of parallelism: parallelism is not constant.
It's like snow and it's many forms: powder, neve, firn, sastrugi, and
the Eskimo words.  I see

	Constant parallelism: spatial parallel is a good example,
	the number of parallel streams does not basically change
	thru time.  Gauss-Seidel and other iterative solutions
	to systems of equations? AND-parallelism (can be coarse or
	fine grained (what ever grain means)).

	Converging parallelism:  The number of parallel streams
	is reducing, perhaps approaching serial: data flow graphs
	of dot products, of the summation step of a matrix multiply,
	a Gauss-Jordan (elimination, or direct solution) is another example.
	Must be fine-grained.

	Diverging parallelism: (perhaps several types): many forks,
	OR-parallelism, fractal.  Like diverge series, this type of
	parallelism has problems. (Can be fine or coarsed grained?)

The real key is the quantitative characterization (at some level)
of parallel-ism.  Are we to only count streams?

While it is largely a matter of communications/coupling, how do
we evaluate the communications needs of an algorithm as opposed to an
architecture?

What are we going to do with 2-D flow charts where we need to 
express forking and branching on the same 2-D plane?

Oh well! Still searching for an honest definition.

"Socio/politico/economic commentary"

Recent economically based events in parallel processing are amazing.
The number of companies actively marketing hypercube arcitectures
and Crayettes is stagering.  Machine with Cray class power are not
surprising, this is inevitable.  Cray instruction set compatable machine
offerings is what is surpising about this. There are so few Crays (100)
out there, that the half dozen or more companies who say they are
offering such guarantee failure.

More surprising are the number of hypercube architectures.  Admittedly,
hypercubes offer very nice connectivity features, but only one person
has a good perspective: Justin Rattner, Intel, who offered the
machine as an experimental testbed not a Cray alternative.
What with all this talk about parallelism, it is surprising there are not
more companies marketing, designing, etc., mesh-type architectures
ala ILLIAC IV style architectures.  That spatial model of parallelism (SIMD)
is probably the easier to build if not program.  This latter point is worth
some debate, but as noted many models of parallelism are spatially based.
Only the MPP, the DAP, and it seems the Connection Machine to a somewhat lesser
extent are based this way (albeit more connections).
It would be argued by some that this is for more limited applications
but again those are spatially based problems tned to dominate.  Why no
68Ks or 32Ks in a mesh?  Is it all marketing hype?   How could the money be
better directed (for research purposes since obviously some of this
money is bound to go into failed experiments [necessitated by
empirical work]), can we spread out the "cost" to develop new architectures.

Ah yes, reinventing the ILLIAC again.

"A few asides:" [From S. Diane Smith]
 
  When asked about the interconnection network in MPP compared to
that of the STARAN, Ken Batcher replied, "We learned that you didn't
need all that (the multistage cube/omega) for image processing, that
a mesh was good enough."
   You could look at MPP as a second generation parallel processor,
even if the processors are only 1 bit wide.  They got rid of
a number of "mistakes" that they learned about through STARAN.

    The "tightly coupled" .vs. "loosely coupled" debate went on
7-8 years ago before everyone got tired of it.  It was sort of 
the analog of the RISC vs. CISC debates of today.  The net result 
was sort of an agreement that there was a spectrum, not a dicotomy.
There have been one or two papers on classification, none very satisfying.
I'll see if I can't find them.

    The latest thing you see in parallel processing is the "real"
numerical analysts who are actually putting the problems on 
machines.  Until very recently, with a few exceptions from the ILLIAC
days, most parallel numerical analysis has been theoretical.

Diane. . .