[comp.arch] chewing up mips

eugene@pioneer.arpa (Eugene Miya N.) (06/25/87)

<Ignore if you don't like performance.> (yellow diamond)

In article <5793@think.UUCP> bradley@godot.think.com.UUCP (Bradley Kuszmaul) writes:
>>person determined that the average code spends 90% of its time
>>executing 10% of its code.
>It is also likely that the dependencies between
>data are something smaller than "completely connected", and so it is
>likely that different parts of the data can be processed in parallel.
>. . . .
>The lesson
>is "Rules of thumb, such as Amdahl's Law, don't have much to say about
>parallel computing".

Knuth (Software Practice and Experience, 1971) collected empirical data
to "verify" this suggestion.

I think you should note that Amdahl, who spoke in the Valley recently,
points out this "Law" [he does not call it a Law] represents a useful
simple mathematical bound.  A more difficult problem is HOW we say
something is "80%" parallel.  "80%" of what? Source code? Run time
object code?  No one ever defined that.

>As far as I know, no one has seriously tried to solve the Karp problem.
>I think it is only a matter of time.  The problem is that in order to
>even have a chance at solving the Karp problem, you have to have at least
>100 processors in your parallel machine (otherwise no way are you going
>to get a factor of 100 speedup).
>
>Company plug:  The Connection Machine, which has 64000 processors in it,
>probably has the best chance of solving the Karp problem, but it seems a
>little unfair to compare 64000 one bit processors to a single one bit
>processor (for a performance improvement of 64000) or to compare a 64000
>processor Connection Machine to a Cray (which is made out of much more
>expensive technology).  My feeling is that the Karp problem should have
>been phrased as "compare your parallel machine with a single processor
>serial machine made out of about the same amount of hardware" (or perhaps
>the same cost?).  This would pit a Connection Machine against about a
>third of a Cray 1, or perhaps a couple of Vax 8800's or a high end IBM mainframe.
>
>Bradley C. Kuszmaul, Thinking Machines Corporation, Cambridge, MA

Permit me to speak on behalf of Alan Karp of the IBM Scientific Center
who does not read the Usenet and is not a big Unix fan (You can reach
him at karp@ibm.com).  First, the factor of performance gain is 200
(not 100, so a slight factor off ;-).  Second, the CM does not currently
meet the minimum criteria for an MIMD machine (SIMD machines don't count
as Alan regards them as somewhat specialized architectures).  Alan's
intent WAS to compare to a single processor of "the same amount of
hardware."  Problem is what amount?  How does one measure this [me]?
Rate?  Physical volume?  There are two subproblems: 1) (very hard)
determining comparable situations (a null hypothesis), 2) finding
distinct differences after "apples have been compared to apples."

One problem with the first Connection Machine was that the need for
numerical precision (64-bit minimum) crippled the first CM to about 70
MFLOPS (not my favorite measure) making it slower than a Cray-1.  Giving
the excuse "many problems can get by with 32-bit data" does not fly.
The bit serial processors are problems of precision requiring problems
like structural simulations and the like.  The audience for such machines
are folks like Jack Dongarra and Alan, etc.: numerical analysts.

I'm just relating to you the climate of measurement.  Trying to
rationalize a machine (e.g., okay in 32-bit) (p.s. generically,
not TMC's, but including Alliants, Convexes, FLEXen, Sequents, Cubes,
Elxsi, SCS, [should I be using real names??? possibly not], etc. etc.)
does not cut slack at the National Labs, Research Centers, and other
purchasers of big machines.  What it does do is alienate after the
purchase of a machine. Users find it does not run as fast as the users
would like, after the fact.  This then prevents the further purchase of a
second follow on machine.  Perhaps this is where my saturation measure
might come in handy [time(saturation) > time(frustation) ? next(machine)
: another(machine)]?

Can we move on?

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

bradley@think.uucp (Bradley Kuszmaul) (06/26/87)

In article <1908@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes:

>In article <5793@think.UUCP> bradley@godot.think.com.UUCP (Bradley Kuszmaul) writes:
>>>person determined that the average code spends 90% of its time
>>>executing 10% of its code.
>>It is also likely that the dependencies between
>>data are something smaller than "completely connected", and so it is
>>likely that different parts of the data can be processed in parallel.
>>...The lesson
>>is "Rules of thumb, such as Amdahl's Law, don't have much to say about
>>parallel computing".
>...
>I think you should note that Amdahl, who spoke in the Valley recently,
>points out this "Law" [he does not call it a Law] represents a useful
>simple mathematical bound.  A more difficult problem is HOW we say
>something is "80%" parallel.  "80%" of what? Source code? Run time
>object code?  No one ever defined that.

Somehow, I get the feeling we are not talking about the same thing.  The
original article, to which I was responding, used the 90-10 rule to try
to get a handle on how much parallelism is in typical programs.  My point
is that the rule does not have anything to do with how much parallelism
is in a program.  There exist programs (such as running 1000 runs of an
Ada compiler) which can achieve a 1000 times speed up, and there may be
programs which can not be sped up at all by parallel hardware, but both
of those classes of programs spend 90% of their time executing 10% of the
code.
>>...Company plug:  The Connection Machine, ...
>Permit me to speak on behalf of Alan Karp of the IBM Scientific Center
>who does not read the Usenet and is not a big Unix fan (You can reach
>him at karp@ibm.com).  ... the CM does not currently
>meet the minimum criteria for an MIMD machine (SIMD machines don't count
>as Alan regards them as somewhat specialized architectures).

It has been several weeks since I read the text of Karp's rules, so off I
go into deep water....

It seems kind of self serving to EXPLICITLY exclude SIMD machines just
because they are "regarded as somewhat specialized".  It is another
matter altogether if he is saying "I don't think it could possibly win,
but go ahead and try if you want".  The reason I make this distinction is
that users have been remarkably sucessful at getting a wide variety of
programs to run on the Connection Machine.  I still don't understand what
would keep the Connection Machine from being considered a general purpose
parallel computer, especially now that the Connection Machine has
floating point hardware and indirect addressing.

>  Alan's
>intent WAS to compare to a single processor of "the same amount of
>hardware."  Problem is what amount?  How does one measure this [me]?
>Rate?  Physical volume?  There are two subproblems: 1) (very hard)
>determining comparable situations (a null hypothesis), 2) finding
>distinct differences after "apples have been compared to apples."
I don't really think there is such a big problem as long as everyone
agrees that we should be talking about something that uses "about the
same amount of hardware".  We could use list price, wholesale price, cost
of parts and labor, mass, volume, or whatever.  We pick a reasonable cost
measure (I don't think it matters which), then we determine the cost of
the parallel machine, then we pick the best serial machine with that
cost, then we guess whether we solved the Karp problem.  This strategy
will work since we can probably agree on several "reasonable" cost
measures, and we only need one to serve our purpose.

>One problem with the first Connection Machine was that the need for
>numerical precision (64-bit minimum) crippled the first CM to about 70
>MFLOPS (not my favorite measure) making it slower than a Cray-1.  Giving
>the excuse "many problems can get by with 32-bit data" does not fly.
>The bit serial processors are problems of precision requiring problems
>like structural simulations and the like.  The audience for such machines
>are folks like Jack Dongarra and Alan, etc.: numerical analysts.
>
>I'm just relating to you the climate of measurement.  Trying to
>rationalize a machine (e.g., okay in 32-bit) (p.s. generically,
>not TMC's, but including Alliants, Convexes, FLEXen, Sequents, Cubes,
>Elxsi, SCS, [should I be using real names??? possibly not], etc. etc.)
>does not cut slack at the National Labs, Research Centers, and other
>purchasers of big machines.  What it does do is alienate after the
>purchase of a machine. Users find it does not run as fast as the users
>would like, after the fact.  This then prevents the further purchase of a
>second follow on machine.

Was anyone given a false impression of the speed of the Connection
Machine?  Thinking Machines was very careful not to oversell the
capabilities of the machine because of exactly the problem you describe.
I would appreciate any leads on "who got what bad data from whom and
when".

Bradley C. Kuszmaul, Thinking Machines Corporation, Cambridge, MA
 bradley@think.com
 bradley@think.uucp (i.e. seismo!think!bradley)