[net.arch] Addition followups to "The Mythical MIPS"

eugene@ames.UUCP (07/17/86)
Two other followins from Tom Deboni and Ian Kaplan came to me.  I used
to work with Tom and have respect for his ideas as I have also talked
alot with Ian.  My comments are indented

From: deboni@pp.mcc.com (Tom DeBoni)
Subject: Comments on the latest flaming on parallelism

You notice I do not head this as "flaming on the latest comments..." There's a
reason for that. Let's cut through the bafflegap and get at the heart of the
matter. What do we really want of the ultimate parallel/concurrent/coordinate/
cooperative/multiprocessor (choose one) computer systems we all like to dream
about and (hopefully) work toward? We want one that's smart enough to know what
we want it/them to do without our having to exhaust ourselves saying so. In
short words, we want ease of programming. That is serious. Programming anything
is hard - so hard most of us aren't really sure what it means to do it "right".
Programming a multiprocessor system can be as exponentially explosive in 
difficulty as it is in physical components, and possibly even worse (you will
likely have to use all of those myriad components many times in the course of a
program execution).

Picture yourself a dance-hall caller: everybody obeys your
	Ah! the classic dance-hall model!
calls and dances the steps you specify in unison. If ou have only a few dancers,
this is analogous to programming a uniprocessor. Now imagine being each of the 
dancers simultaneously, do-se-do-ing like crazy, sometimes waiting, sometimes
reaching, shuffling, twirling, etc., and with all the strange rhythms you can
imagine. This is programming the contemporary multiprocessor system. Now try
imagining this all going on asynchronously - without a controlling bass beat;
this is what we're in store for in the near future (to an extent it's here
already), when more difficult problems and more complex algorithms start forc-
ing us out of lock-step mode. Every instruction, wait, send, receive, resume,
variable, message, file, OS call, and hardware glitch will be the responsibility
of the already up-to-his-neck-in-mud programmer. Do you want the job? I sure
don't.

So then what? Well, how about establishing some discipline about the use of
these complicated systems, in much the same way that modern languages and OS
practice did for programming contemporary uni-processors. We use block-struc-
tured languages with lexical scoping and rational control structures because
they ease the pain of thinking about our programs, and seldom care about the 
inefficiencies they involve. We use virtual systems and virtual memory for the
economic and temporal convenience they afford, and almost never care about the
mechanical "man behind the curtain" that gives us the illusion but not the
reality; because we find the illusion just dandy, and the reality impractical
anyhow. (Sure we'd all love to have a 100 MIP workstation with 1GB of ram,
but who could afford one?)

Avoiding the inefficiencies is a problem we'd rather not deal with until we are
ultimately backed into that corner. No self-respecting programmer today
starts out by playing tricks with machine instruction sequences to save clocks
(such as loading a small constant and shifting it, to save floating an integer,
which may take slightly more time), just as no one would purposely spaghettify
code to avoid the use of null else clauses and thus minimize useless branching.
These pathognomic practices are saved for the really tough situations, and
go in when the rest of the code is well structured for understanding, use,
and maintanence. And then only when they're needed. Optimization is a chimera
that most of us don't care to chase, and with good reason: life's too short, and
such obsessive-compulsive behavior is seldom rewarding enough to justify itself.

So why not try to smooth out some of the rough edges and fill in some of the
crevices in the parallel/call-it-what-you-will processing world? What am I
talking about? About languages with clean semantics, with which automatic
tricks can be played, and which give predictable, if inefficient, results.
This may sound strange, coming from an architecture person, but I've been
down there on the "killing floor", and I "ain't gonna study war no more". I
became dissatisfied with advanced architectures when I realized how hard they
were for humans to deal with. I thought the answer might be in new execution 
models, but had no idea what they might be; so, I started thinking about better
languages and programming environments, believing that once we've figured out
the details of what we really want to do, the rest will be engineering and
technology. 

I'll take a language with a few good high-level mathematical constructs (vector,
array, set, stream, etc.), and a mathematically water-tight interpretation
(applicative semantics), and let the post processor worry about optimization.
Not fair to assume magic on the part of the compiler or it's author, you say.
Exactly. We won't need magic - the functionality will save us. And how do we
get function definitions to map smoothly and efficiently onto finite state
machines? We may not; but new execution models are what we're really hoping
for; at best, they will take us the next steps, and at worst, we may be stuck
with running functions on state-based machinery. But even in the latter case, 
I'm willing to bet we can smooth out the interface between the functions and
the registers - make it smaller, more compact, and better behaved - and do so
much more cost-effectively than we can continue to try to use dance-hall
architectures by being the entire dance-hall ourselves.

Humbly submitting the above, /* editing */
Thomas M. DeBoni
Graduate of the Famous Flamers Institute, and Doctor of Sourcery
(deboni@mcc, deboni@sally, cs.deboni@utexas-20, fill in the domains later...)


From: nike!ll-xn!s3sun!sdcsvax!loral!ian (Ian Kaplan)

Eugene:

Some notes on your artice ("The Mythical MIPS (MegaFLOPS)"):

>
>One problem is that the world is not always parallel.

  You make use of parallelism where you find it.
  In some problems there is only obvious pipeline parallelism, in other one
  can get lots of horizontal parallelism.  Part of the problem is that
  computer science people have been writing about "order" speed ups (e.g.
  the problem can be sped up by some factor that is a function of the
  number of processors).  In real life (at least at the present time) the 
  way things work is that you have a problem that does not run fast enough
  on the uni-processor you can afford (there are only a hand full of people
  that have problems that are too slow for Crays).  You then look at
  putting the problem on a parallel processor.  You can either figure out
  how to do it or you can't.

  Part of the problem is that many people in the computer science community
  are looking for general solutions.  I suspect that we do not know enough
  yet about parallel processing to build general purpose machines.  To be
  general purpose a parallel processor should not force the programmer to
  be aware of the physical architecture.  To me this means small grain
  dataflow machines running something like SISAL, ID or Lucid.
	As you point out later, there are economics involved.  There
	is great motivation for general pyurpose architectures
	and I get this high-level argument evey week.  I dare say that
	man special purpose parallel machines do not work `efficiently.'

>I see two communities who are not communicating:
>physical scientists see "spatial" parallelism: all those difference
>equations over a given space, they see meshes, but the computer science people
>(typically the AI and compiler people) see "syntactic" parallelism,
>they tend to see syntax trees like data flow graphs, for instance.

	Comment: not a single letter came from the AI community.  I
	suspect the AI people (generally) don't know what they will do with
	"parallelism."  The vision people are an exception.  I think
	we might call the AI bluff soon.

  What do you mean by a mesh?  Any network connected system (like a binary
  n-cube) is, as I understand it, refered to as a "mesh connected" system.
  One form of a mesh is a rectangular grid.  While grids are good for
  solving problems that need only nearest neighbor communication, the
  message passing overhead is such that a grid is not very good for any
  general class of problems.

	Mesh: as you say: nearest spatial neighbor.  Message passing is
	lower level, an implementation detail.  What about the higher
	level?  The general notation is still poor.  We had some SISAL
	discussions about the annotation of boundary conditions
	(not PDE, but array boundaries).  The concurrent constructs
	most recently surveyed by Andrews and Schneider in ACM CSs
	are OS oriented and typically of small numbers of processes
	like forking.

>"The Concept"

   It seems to me that you ignore the importance of pipeline parallelism
   for a large class of problems.  Many biological systems are pipelined,
   although baby production is not one of them.  Dependance does not always
   mean that a problem cannot benifit from a parallel machine.

	Name some (preferably at least 3).  That was my point about
	using Brooks' example.  We confuse work and effort.

>It would be argued by some that this is for more limited applications
>but again those are spatially based problems tned to dominate.  Why no
>68Ks or 32Ks in a mesh?  

  Again, I am not sure what you mean by mesh.  The N-Cube system is the
  equivilant of a bunch of 68000's connected in a binary n-cube mesh.
  The Loral dataflow system uses a bunch of 32000's connected by a
  segmented busses.  Off the shelf micro-processors like these are not
  very good at doing high speed message passing.  This is an important
  component is system performance.  If you use a standard processor you
  must provide front end hardware to support message passing.  This is what
  we do.

>Is it all marketing hype?   How could the money be
>better directed (for research purposes

   Experimenting with computer architectures is a very expensive business.
   Historically most of it has been done in the commercial world.  This is
   not to say that research does not play an important part, but only in
   the commercial world will people do the dirty grunt work that is needed
   to produce a system that is really usable.

>    The "tightly coupled" .vs. "loosely coupled" debate went on
>7-8 years ago before everyone got tired of it.  It was sort of 
>the analog of the RISC vs. CISC debates of today.  The net result 
>was sort of an agreement that there was a spectrum, not a dicotomy.

  Well there may be a spectrum, but I still do not understand why
  presumably intellegent researchers believe that shared memory machines 
  (e.g., tightly coupled) like the NYU Ultracomputer or the RP3 are real
  large scale parallel processors.  (As you might guess I work on a
  distributed memory, or loosely coupled, system).
	You make a good point here.  Our researchers (numerical
	analysts) like the guys at NYU have been accused of just
	extending serial algorithms, few (but more coming) truly parallel
	algorithms exist.  Dongarra was one of the people recently making
	this point. Are tightly coupled machines MISD?
 
>    The latest thing you see in parallel processing is the "real"
>numerical analysts who are actually putting the problems on 
>machines.  Until very recently, with a few exceptions from the ILLIAC
>days, most parallel numerical analysis has been theoretical.
>
>Diane. . . 

   There are those that point out that it took a long time for people to
   learn to use vector processors and that it will take at least as long
   for people to learn to use parallel processors.

		     Ian Kaplan
		     Loral Dataflow Group
	     USENET: {ucbvax,decvax,ihnp4}!sdcsvax!sdcc6!loral!ian
	     ARPA:   sdcc6!loral!ian@UCSD

	Here is that distinction between parallel and vector again.
	It did not take long to learn.  I would argue that vectorization
	in some ways is too easy.  Everybody can think in terms of vectors
	and that's why I think machines like Crays, Convexes, and Alliants
	are doing so well.  That's why the Hypercubes are adding vector
	options and you will see the older companies doing the same (oh I
	forgot IBM with the 3090, I left out CDC/ETA intentionally, could
	change).  Vectors are very regular, should not parallel be?
	Well yes, until you get to MIMD.  Most of big users would be happy
	with vectorizable loops with conditionals that's MIMD to them
	for now.  You have convinced me more than ever that we have a
	terminology problem.

From the Rock of Ages Home for Retired Hackers:
--eugene miya
  NASA Ames Research Center
  com'on do you trust Reply commands with all these different mailers?
  {hplabs,ihnp4,dual,hao,decwrl,tektronix,allegra}!ames!aurora!eugene
  eugene@ames-aurora.ARPA