[net.arch] I don't believe your statements about multiprocessors

reid@Glacier.ARPA (Brian Reid) (05/08/85)

I've been watching people make claims about the wonderfulness of
multiprocessors for 20 years. I have also watched a much smaller number of
people actually build and measure the performance of multiprocessors. This
newsgroup has recently been alive with various informed and uninformed
metaphysical babbling about multiprocessors. I don't believe most of it.

I have never seen a single particle of evidence, not one number, that says
that a tightly-coupled (e.g. shared-memory) multiprocessor is in any way
better than a uniprocessor of the equivalent aggregate speed. If you know
how to build a 100-MIP uniprocessor CPU, or 10 10-MIP processors for the
same instruction set, or 100 1-MIP processors, then it is always much better
to have the uniprocessor. It might be cheaper to build the multiprocessor,
but the uniprocessor is a better computer.

For loosely-coupled architectures there are sometimes arguments about
reliability through redundancy, though they tend not to hold water in
practice because of peripherals. But for a shared-memory machine, the only
reason to build a multiprocessor instead of a uniprocessor is to make it
cheaper. Otherwise, the uniprocessor is easier to program, faster (no
synchronmization cost), has a higher burst speed, and can perform any
parallel computation that the multiprocessor can perform.

Of course it is not always possible to build a uniprocessor as fast as one
would like, so multiprocessors and vector machines have always been at the
leading edge of the speed wars, but this is not because they are better
computers but because people know how to build them.

I am always interested in seeing hard data about computer architecture (or
anything else, for that matter). I invite any of the proponents of radical
architecure multiprocessors to show me numbers demonstrating their
superiority over uniprocessors.
-- 
	Brian Reid	decwrl!glacier!reid
	Stanford	reid@SU-Glacier.ARPA

pop@mtu.UUCP (Dave Poplawski) (05/09/85)

> I have never seen a single particle of evidence, not one number, that says
> that a tightly-coupled (e.g. shared-memory) multiprocessor is in any way
> better than a uniprocessor of the equivalent aggregate speed. If you know
> how to build a 100-MIP uniprocessor CPU, or 10 10-MIP processors for the
> same instruction set, or 100 1-MIP processors, then it is always much better
> to have the uniprocessor. It might be cheaper to build the multiprocessor,
> but the uniprocessor is a better computer.

I don't think you will get any arguments about this - anybody would  rather
have  a  10,000  MFLOPS  uniprocessor  than  a 10,000 MFLOPS multiprocessor
(shared memory, hypercube, mesh or whatever) - I would if  the  price  were
comparable  or  even if the uniprocessor were more expensive.  Avoiding the
reprogramming effort to express the parallelism in sequential  programs  in
sequential  languages  would probably make up the difference in cost.  Even
for new programs, most of us still find it easier  to  write  a  sequential
program  than  a  parallel  (especially massively parallel) program, and in
many cases there isn't that much parallelism in the problem  in  the  first
place.

> Of course it is not always possible to build a uniprocessor as fast as one
> would like, so multiprocessors and vector machines have always been at the
> leading edge of the speed wars, but this is not because they are better
> computers but because people know how to build them.

Exactly -  you  answered  your  own  question!   As  long  as  the  fastest
uniprocessor  available  is  a  couple  of  orders of magnitude slower than
available multiprocessors, and there are people who want  (need)  to  solve
problems  that are adaptable to the multiprocessor, then the multiprocessor
will be better (for those people and  those  applications).   There  is  no
religion  here,  just  technology.  As soon as you can build a uniprocessor
that is as fast as any multiprocessor, the multiprocessors will go away.
-- 
Dave Poplawski
Michigan Technological University
uucp: {lanl, ihnp4, glacier}!mtu!pop
arpa/csnet:  pop%mtu@csnet-relay

chuck@dartvax.UUCP (Chuck Simmons) (05/10/85)

> ...But for a shared-memory machine, the only
> reason to build a multiprocessor instead of a uniprocessor is to make it
> cheaper....
> 
> Of course it is not always possible to build a uniprocessor as fast as one
> would like...
>
> Brian Reid

Seems like that makes 2 very good reasons to build hypercubes and friends.

Chuck

nather@utastro.UUCP (Ed Nather) (05/11/85)

> As soon as you can build a uniprocessor
> that is as fast as any multiprocessor, the multiprocessors will go away.
> -- 
> Dave Poplawski

This makes no sense to me.  If you can build a really fast uniprocessor,
why can't you run a bunch in parallel and get more thoughput?  Are you
suggesting there may be a computer so fast no problem can keep it busy?
My friends, the astrophysicists, don't believe that for a minute.

-- 
Ed Nather
Astronony Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather

sambo@ukma.UUCP (Inventor of micro-S) (05/13/85)

In article <7202@Glacier.ARPA>, reid@Glacier.ARPA (Brian Reid) writes:
> I invite any of the proponents of radical
> architecure multiprocessors to show me numbers demonstrating their
> superiority over uniprocessors.

Perhaps you ought to also be interested in those that are equal in performance,
etc. with uniprocessors, but cost less.

jans@mako.UUCP (Jan Steinman) (05/13/85)

In article <7202@Glacier.ARPA> reid@Glacier.ARPA (Brian Reid) writes:
>For loosely-coupled architectures there are sometimes arguments about
>reliability through redundancy, though they tend not to hold water in
>practice because of peripherals.

There are companies who are making piles of money doing exactly this.  Most
notable is Tandem.  (I'm a former employee.)  The peripherals in a Tandem
(and I assume, their recent competitors) follow the scheme, with performance
benefits in the case of disks.  Except for some magic concerning defect
mapping, the mirrored disks have identical images.  Writes are performed in
parellel, but the task of reading is given to whichever disk is currently
positioned closest to the desired data, effectively cutting average read 
access in half.

Some of Tandem's recent upstart competition have tried to spread this
philosophy to other peripherals, but is was determined that the difficulty of
reading the alternating-line listings produced by parallel printers offset the
speed advantage. (I can't believe I wrote that! :-)

-- 
:::::: Jan Steinman		Box 1000, MS 61-161	(w)503/685-2843 ::::::
:::::: tektronix!tekecs!jans	Wilsonville, OR 97070	(h)503/657-7703 ::::::

doug@terak.UUCP (Doug Pardee) (05/13/85)

[I consider the word "always" as a personal challenge...]

> If you know
> how to build a 100-MIP uniprocessor CPU, or 10 10-MIP processors for the
> same instruction set, or 100 1-MIP processors, then it is always much better
> to have the uniprocessor.

It's not what everyone else is discussing, but there is *too* an
application where 10 10MIPS CPUs (MIMD) will beat 1 100MIPS CPU.  That's
where there are 10 totally independent jobs to be done.  For example,
multi-user operating systems.  A single CPU would have to deal with the
overhead of context switching.

Which leaves me kinda confused...

I gather from the preceding discussion that the Cray is a SIMD (vector)
machine, and does quite nicely on achieving high performance working on
a single job.  So why then would anyone want to bog it down with a
multi-user operating system?

Wouldn't it make more sense to build a multi-micro system to run the
operating system and for program development (one CPU for each user),
thereby freeing up the vector CPU to actually *run* jobs (one after
the other)?
-- 
Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
               ^^^^^--- soon to be CalComp

pop@mtu.UUCP (Dave Poplawski) (05/14/85)

> > As soon as you can build a uniprocessor
> > that is as fast as any multiprocessor, the multiprocessors will go away.
> > -- 
> > Dave Poplawski
> 
> This makes no sense to me.  If you can build a really fast uniprocessor,
> why can't you run a bunch in parallel and get more thoughput?  Are you
> suggesting there may be a computer so fast no problem can keep it busy?
> My friends, the astrophysicists, don't believe that for a minute.
> 
> -- 
> Ed Nather

The statement was made in a whimsical voice  (couldn't  you  hear  it).   I
don't think that anybody will ever do it, probably because it is impossible
for  the  reason  you  stated.   However,  don't  count   out   very   fast
uniprocessors  -  I  wouldn't  want  to  try  to  get a 100-fold speedup on
something like troff by putting it on 100 (or even 200, or 300, or ...) cpu
multiprocessor.   Some  problems  just  don't  seem to be very amendable to
parallel solution, at least not that parallel.

An interesting question is whether the throughput you mention  is  realized
on a single problem, or several independent ones.  On a single problem that
must  be  broken  into  cooperating  processes,  it  is  possible  that   a
multiprocessor would be slower than the uniprocessor because of contention,
communication costs, synchronization  overhead  and  delay,  etc.   It  all
depends on the problem, the algorithm, the program, the multiprocessor, ...
-- 
Dave Poplawski
Michigan Technological University
uucp: {lanl, ihnp4, glacier}!mtu!pop
arpa/csnet:  pop%mtu@csnet-relay

wcs@ho95b.UUCP (Bill Stewart) (05/14/85)

Multiprocessing has a number of advantages over uniprocessing, which
in many circumstances outweigh the disadvantages.  The primary
reason for multiprocessor machines is of course technology - it's a
lot easier to combine 100 10-MFLOP processors than to build a 1-GFLOP
processor  (and if you build one, you can combine 100 of THEM.)

If the processing you really want to do is true uniprocessing, then
probably the fast uniprocessor will win.  But most computing is
inherently multiprocessing - either there are multiple users, or
the problem has a reasonable degree of parallel structure that can
better be exploited on a multiprocessor (e.g. finite element
calculations, network modelling, etc.)  On the multiprocessor, each
processor can potentially take its one job and grind away; the
uniprocessor wastes a lot of overhead doing process switches,
swapping and paging.  On the other hand, the multiprocessor wastes
power when there are idle processors, whereas the uniprocessor goes
faster if it has fewer jobs to do.  The value of either approach
really depends on the application environment and the tradeoffs you
have to make; neither can be ruled out.

			Bill Stewart
-- 
			Bill Stewart	1-201-949-0705
			AT&T Bell Labs, Room 4K-435, Holmdel NJ
			{ihnp4,allegra,cbosgd,vax135}!ho95c!wcs

henry@utzoo.UUCP (Henry Spencer) (05/15/85)

> >For loosely-coupled architectures there are sometimes arguments about
> >reliability through redundancy, though they tend not to hold water in
> >practice because of peripherals.
> 
> There are companies who are making piles of money doing exactly this.  Most
> notable is Tandem.  (I'm a former employee.)  The peripherals in a Tandem
> (and I assume, their recent competitors) follow the scheme, with performance
> benefits in the case of disks.  Except for some magic concerning defect
> mapping, the mirrored disks have identical images...

Yes, but most multiprocessor systems do *not* duplicate all the peripherals.
Replication of peripherals is unusual except on systems (like Tandem's)
whose major goal in life is high reliability.  I think Brian's point was
that "reliability through redundancy" is not a significant advantage for
multiprocessors unless peripherals are duplicated too, which they usually
aren't.  I know that C.mmp -- one of the multiprocessor systems that Brian
has worked on -- was plagued by fast-but-unreliable disks and insufficient
funding for full replication.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

jqj@cornell.UUCP (J Q Johnson) (05/16/85)

I guess I believe that, at least in principal, a multiprocessor could
be preferable to an equivalent uniprocessor (= same aggregate througput,
same total cache, etc.).  The argument is that both multiprocessor and
uniprocessor suffer scheduler overhead, but that in some cases the
overhead on a multiprocessor will be less.  A trivial example is a
uniprocessor versus an n-fold multiprocessor running n completely 
independent tasks; presumably the scheduler overhead in that case will be
linear in the length of the task for the uniprocessor, but sublinear
(perhaps even a constant) for the multiprocessor (assumption here is
that scheduler only gets invoked when someone isn't currently running
but wants to be).

mash@mips.UUCP (John Mashey) (05/17/85)

J. Q. Johnson (..!cornell!jqj) writes:
> I guess I believe that, at least in principal, a multiprocessor could
> be preferable to an equivalent uniprocessor (= same aggregate througput,
> same total cache, etc.).  The argument is that both multiprocessor and
> uniprocessor suffer scheduler overhead, but that in some cases the
> overhead on a multiprocessor will be less.  A trivial example is a
> that scheduler only gets invoked when someone isn't currently running
> but wants to be)....... 

One can certainly find examples of this; in particular it is true if one
has n independent tasks that 1) are very compute bound 2) are small enough
that paging traffic is minimal.  Otherwise, what you find is that the OS
has to pay the price not in scheduling overhead, but in other coordination
overhead.  For example, either you use snoopy caches [and chew up bandwidth
and basic cycle time] or handle cache consistency by various software
mechanisms.  Next, you have to handle TLB consistency, and then you must
interlock terminal I/O, disk cache I/O, etc.  In every MP implementation I've
seen, you always had to add code to interlock against rare events.
Hence, as long as you stay in user programs, you can be OK, but as soon as
you spend significant time in the kernel, you pay some coordination price.
[Note: the above applies to general-use, not special-case systems.]

Complexity is like garbage.  Hard work can keep the amount down, but
won't make it go away.  If you sweep it under the rug you'll be sorry.
At best, you can at least choose a good location for the garbage dump.
-- 
-john mashey
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash
DDD:  	415-960-1200
USPS: 	MIPS Computer Systems, 1330 Charleston Rd, Mtn View, CA 94043