[comp.arch] Hypercubes

eugene@pioneer.UUCP (02/25/87)

In article <1210@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:
>Distributed memory networks have been used for multi-user systems for several
>years now - cf. Apollo networks.  Some, at least, would argue they have been
>used successfully.  However, machines like the iPSC were designed to do heavy
>computing, and NOT a lot of resource sharing.  The cube manager is too much of
>a bottleneck to be used as a resource server to the tower.
> . . .
>The hypercube is set up to do number crunching, with lots of operations per
>byte of I/O.
>-- 
>Doug Pase   --   ...ucbvax!tektronix!ogcvax!pase   or   pase@Oregon-Grad

Hypercubes were not designed to do a lot of heavy computing.  You would
be putting them into the Cray class of processor, and everyone's
experience has been to the contrary.  Heavy computing requires a well
thought out (balanced) structure to prevent things like an I/O bottleneck.
A hypercube is far from a typical end-user machine.

The marketing hype which has surrounded hypercubes astounds me.  It
turns out the ONLY person I have heard a level-headed response from was
Justin Rattner of Intel who stated that these machines are research
machines to provide exposure to people on the problems of doing parallel
programming.  They are not designed to replace Crays or compete with
them.  To believe so would involve a great deal of misunderstanding.
There are now five (if not more) companies selling hypercube
architectures out there, I doubt if any will survive in the long term
(in the hypercube market).  Don't hold your breath for software either.
Don't expect to take you dusty deck C or Fortran and have it automatically
parallelize it (when it does, we will have achieved true AI 8-).

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

p.s. the only buzzword I didn't use was A*a.

miner@ulowell.UUCP (02/25/87)

In article <362@ames.UUCP> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>Don't expect to take you dusty deck C or Fortran and have it automatically
>parallelize it (when it does, we will have achieved true AI 8-).
>--eugene miya
>  NASA Ames Research Center
The Alliant (a supermini parallel/vector machine made in MA) has a decent 
parallelizing FORTRAN compiler, and they are working on a C.  Many of 
the companies coming out with parallel computer systems are doing parallelizing
FORTRAN as the first language.  Still not the best way to express things, but
it allows you to port code.  And even with compilers like Alliant's it is
still good to go through the code and modify loops to remove dependencies 
between variables etc, it helps the compiler perform better.


-- 
Rich Miner !ulowell!miner Cntr for Productivity Enhancement 617-452-5000x2693

pase@ogcvax.UUCP (02/26/87)

    In article <1210@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:

	. . . However, machines like the iPSC were designed to do heavy
	computing, and NOT a lot of resource sharing.  . . .
	The hypercube is set up to do number crunching, with lots of
	operations per byte of I/O.

    In article <ames.362> eugene@pioneer.UUCP (Eugene Miya N.) responds:

	Hypercubes were not designed to do a lot of heavy computing.  You would
	be putting them into the Cray class of processor, and everyone's
	experience has been to the contrary.

Agreed, the iPSC is not intended to compete with the Cray - at ~1 Mflop per
32 node tower, it hasn't near the horsepower of a Cray.  Crays are especially
good at problems which are vectorizable and require huge amounts of memory.
The iPSC is not.  (However, just try to pick up a Cray for under $200,000.)

(Miya)	Heavy computing requires a well thought out (balanced) structure to
	prevent things like an I/O bottleneck.  A hypercube is far from a
	typical end-user machine.

Again I agree, but whether a machine architecture is "balanced" or not depends
alot on its intended application.  If a huge volume of communication is
required, the iPSC is probably not appropriate.  Geoffrey Fox of CalTech
presented a paper in one of the 1984 conferences extolling some of the virtues
of a hypercube architecture (NOTE: NOT an iPSC - the iPSC is based on Fox's
design) for computing.  The article was called "Concurrent Processing for
Scientific Calculations".  It was in an IEEE conference, but I don't remember
which one.  BTW, just about any new architecture is "far from a typical
end-user machine."

(Miya)	The marketing hype which has surrounded hypercubes astounds me.  It
	turns out the ONLY person I have heard a level-headed response from was
	Justin Rattner of Intel who stated that these machines are research
	machines to provide exposure to people on the problems of doing parallel
	programming.

Perhaps you're too easily astounded, or maybe you think it's of no use because
it's of no use to you...

(Miya)	They are not designed to replace Crays or compete with
	them.  To believe so would involve a great deal of misunderstanding.

Again, I don't disagree - it was never my contention.

(Miya)	There are now five (if not more) companies selling hypercube
	architectures out there, I doubt if any will survive in the long term
	(in the hypercube market).  Don't hold your breath for software either.
	Don't expect to take you dusty deck C or Fortran and have it
	automatically parallelize it (when it does, we will have achieved true
	AI 8-).

No question but that algorithms for the iPSC require a different approach than
Von Neumann style machines; hence dusty decks won't work.  This is no suprise
to me, as there is a big difference between an MIMD architecture and a SISD
architecture, and only a little difference between vector/scaler and SISD
architectures.  Does that mean they'll never succeed?  Well, we'll see...

One Last Word:
I'm glad you subscribe to this newsgroup Eugene; I enjoy your postings.  Please
keep them coming.
-- 
Doug Pase   --   ...ucbvax!tektronix!ogcvax!pase   or   pase@Oregon-Grad

ron@brl-sem.UUCP (02/27/87)

In article <1216@ogcvax.UUCP>, pase@ogcvax.UUCP (Douglas M. Pase) writes:
> Agreed, the iPSC is not intended to compete with the Cray - at ~1 Mflop per
> 32 node tower, it hasn't near the horsepower of a Cray.  Crays are especially
> good at problems which are vectorizable and require huge amounts of memory.
> The iPSC is not.  (However, just try to pick up a Cray for under $200,000.)
Granted ~1 MFlop per tower is not even rivaling a good mini these days.
The nodes in the iPSC are entirely underwelming.  This is not a condemnation
of hypercubes in general, just the Intel one.

news@cit-vax.UUCP (02/27/87)

Organization : California Institute of Technology
Keywords: 
From: jon@oddhack.Caltech.Edu (Jon Leech)
Path: oddhack!jon

In article <1216@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:
>required, the iPSC is probably not appropriate.  Geoffrey Fox of CalTech
>presented a paper in one of the 1984 conferences extolling some of the virtues
>of a hypercube architecture (NOTE: NOT an iPSC - the iPSC is based on Fox's
>design) for computing.  

	This is a common misconception which I will attempt to correct.
The original Caltech Cosmic Cubes (sitting not 20 feet from me), were
put together by a team led by two professors - Fox & Chuck Seitz of CS -
and a number of students from both Physics & CS. The two hypercube groups
split up and have gone their separate ways since then, but please give credit
where it's due, to Seitz. Fox might like to think he did it all by himself, 
but that's not the case.

	I do generally agree with Eugene Miya's assessment of hypercubes,
though. I think it is a big mistake for people to attempt to do practical
work using machines which are still very much research projects themselves
(as I am finding in attempting to do my MS work on the cubes here). The 
biggest problems from my point of view are the terribly immature software 
environments (debugging? what's that?) and extremely slow communications
to the cube hosts.

    -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon)
    Caltech Computer Science Graphics Group
    __@/

eugene@pioneer.UUCP (02/28/87)

In article <1881@cit-vax.Caltech.Edu> jon@oddhack.UUCP (Jon Leech) writes:
>>Geoffrey Fox of CalTech
>>presented a paper in one of the 1984 conferences extolling some of the virtues
>>of a hypercube architecture (NOTE: NOT an iPSC - the iPSC is based on Fox's
>>design) for computing.  
>
>	This is a common misconception which I will attempt to correct.
>The original Caltech Cosmic Cubes (sitting not 20 feet from me),
>    -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon)
>    Caltech Computer Science Graphics Group

As I mailed to the OGC, by chance I was at that meeting where Fox spoke.
It was the IEEE COMPCON, and I won't forget it.  It was in a different
hotel that usual in SF because the usual hotel (Cathedral Hill where it
runs as I write) had a fire.  One week before the LA Times Article about
hypercubes as possible future supercomputers.

Fox (a physicist) got up before these EEs who had read this article.  He
was not well received.  Geoffrey left the stage saying, "I'm not
responsible for what people say about us."  Not one of EE's brighter
moments.  Fox is basically a good guy (also known in the CS community as
part of the Caltech SMP project).

%A Geoffrey C. Fox
%T Concurrent Processing for Scientific Calculations
%J Digest of Papers COMPCON, Spring 84
%r Hm62
%I IEEE
%D Feb. 1984
%P 70-73
%K Super scientific computers
%X An introduction the the current 64 PE Caltech hypercube.  Based
on the dissertation by Lang (Caltech 1982) on the `Homogeneous machine.'

Bart Locanthi also gets credit for the original Homogeneous Machine
thesis (Caltech, 1980).

Oh where is Eugene Brooks, III arguing for shared memory hypercubes when
you need him? Sorry, I should summarize more and follow up less.

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

turner@uicsrd.UUCP (03/01/87)

() Written  by miner@ulowell.cs.ulowell.edu in uicsrd:comp.arch 
()In article <362@ames.UUCP> eugene@pioneer.UUCP (Eugene Miya N.) writes:
()>Don't expect to take you dusty deck C or Fortran and have it automatically
()>parallelize it (when it does, we will have achieved true AI 8-).
()>--eugene miya
()>  NASA Ames Research Center
()
()The Alliant (a supermini parallel/vector machine made in MA) has a
()decent parallelizing FORTRAN compiler, and they are working on a C.
()Many of the companies coming out with parallel computer systems are
()doing parallelizing FORTRAN as the first language.  Still not the best
()way to express things, but it allows you to port code.  And even with
()compilers like Alliant's it is still good to go through the code and
()modify loops to remove dependencies between variables etc, it helps
()the compiler perform better.
()Rich Miner !ulowell!miner Cntr for Productivity Enhancement

There is a BIG difference between parallelizing for a machine like the
Alliant, with lots of nice loop-type synchronization built in (not to
mention a shared memory architecture), and automatic parallelization
for a HYPERCUBE.  When, and if, we do achieve that it will involve far
more than the simple kinds of data dependence analysis that is the
basis for the || compilers that are beginning to appear commercially.
If the differences are not bloody obvious, mail me and I'll be more
than happy to share the troubles I've seen in writing for the iPSC,
compared to an FX/8 its a totally different ball game.  Fun, but tough.
---------------------------------------------------------------------------
 Steve Turner (on the Si prairie  - UIUC CSRD)

UUCP:	 {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!turner
ARPANET: turner%uicsrd@a.cs.uiuc.edu
CSNET:	 turner%uicsrd@uiuc.csnet
BITNET:	 turner@uicsrd.csrd.uiuc.edu

fay@encore.UUCP (03/02/87)

In article <362@ames.UUCP> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>
>Hypercubes were not designed to do a lot of heavy computing.  You would
>be putting them into the Cray class of processor, and everyone's
>experience has been to the contrary.  Heavy computing requires a well
>thought out (balanced) structure to prevent things like an I/O bottleneck.
>
>The marketing hype which has surrounded hypercubes astounds me. 
>...
>  They are not designed to replace Crays or compete with
>them.  To believe so would involve a great deal of misunderstanding.
>There are now five (if not more) companies selling hypercube
>architectures out there, I doubt if any will survive in the long term
>(in the hypercube market).

I agree with your criticism of marketing hype, but not at all with your
prognosis. At least one major oil company is buying large numbers of
hypercubes (not Intel's) to do seismological work (replacing very
expensive on-line time with their Crays). Some hypercubes have made very
substantial improvements on Intel's design.

Granted, one company using them for one narrow application doesn't make
hypercubes the final word in computing, but neither are hypercubes doomed
just because Intel made a mistake using ethernet chips to communicate
between the nodes. That has problem has been ameliorated (though not
solved).

The cost benefits are incredible when one realizes that these programs
actually get better turn-around time on a $12,000 hypercube than a
batch-processing multi-million-dollar Cray.

{linus,talcott,decvax,ihnp4,allegra,necis,compass}!encore!fay

ram@nucsrl.UUCP (03/04/87)

Eugene Wrote:

>experience has been to the contrary.  Heavy computing requires a well
>thought out (balanced) structure to prevent things like an I/O bottleneck.
>A hypercube is far from a typical end-user machine.

  Quite True:

>The marketing hype which has surrounded hypercubes astounds me.  It

  [Talk of marketing hype. The hype reminds of the AI field.  "If you want
to do serious work in AI you have to have a Symbolics or LMI or...". BS.
I can do as well or better with a SUN.  I have even heard interviewers
telling me these.  Do these guys start projects after reading the Ad pages
of AI magazine?] 

  Why doesn't anybody talk of any other network.  I for one like the De Bruijn
Interconnection network.  Any flames/appreciation related to this?
True hypercube is by far the best studied, easy VLSIable, extensible network.
But is an interconnection network the be all and end all of multi-processors. 
No. It is far too early to judge that.

   Hypercubes/delta/banyans or whatever network you choose, I/O bandwidth,
routing protocols, network connectivity would limit the number of
problems that would run faster (relative to number of available processors)
on these.  I am not discounting cubes altogether (TMI guys have shown some
interesting pieces of jugglery with cubes).
I guess we can agree that problems that are communication bound are
bound to have problems with any sort of network.  Alternate Solution:
Have enormous shared memory - Giga Giga Bytes. Here cache coherence
is a major stumbling block.  

   What classes of problems are more suitable for network based machines?
If we assume that a problem is decomposed into a number of sub problems/
processes, problems that are embarassingly parallelizable and with little
inter-process communication would be best for network class of machines.
Heavy communication bound would be suited for Shared memory.  As shared
memory is not viable for more than single digit (optimistic) # of PEs what's
the alternative?

   Solution: Mix these two within a framework (I think Cedar has such
characteristics) so that a few PEs share a common Gigas of memory and such
clusters are interconnected.  It is wasteful to set up communication
(broadcast is different issue altogether) to transfer from A to B for just
a few KBs .  In order to reduce the communication overheads with respect
to the overall transfer time such a framework would be more suitable.  
This is probably what Eugene means as a balanced design.  This has a few plus
points.  Fault-tolerance is improved, alongwith alternate communication
channels. 

   Another common misconception is about vectorization.  Vectorization does
not mean speed-ups for numerical calculations alone.  
Chaining, short-stopping, overlapping provide considerable speed-ups in the
form of reduced memory access cycles.  Till to-day only FORTRAN programmers
had access to such machines (Probably the CRAYs were dedicated to this Saintly
Sect.) and so the use.  Although Vectorizing languages like 'C' is not as easy
as FORTRAN, certainly vectorization is lot easier than auto-parallelization. 
[Gould had done some work (wonder what happened to it) to vectorizing C as
well as Kuck & Associates].  
    
   I had done some research as part of a team in analysing a class of
problems (ranging from bit manipulation, searching/sorting, tree
manipulation to Fortran like number juggling).
Disregarding the underlying architecture, the analyses were separated
into parallelizable sections and vectorizable sections.  Some problems
were embarazzingly parallelizable and some had little amount of
parallelization (A solution tree - a typical prolog search tree)
but the amount of speed-up in vectorization is considerable in almost all
problems. (No wonder there are so many vector CPU designs on the works today).

   If we build huge CPUs that crunch data at an alarming rate, communication 
latencies are going to limit their loading capabilities.  If we build small CPUs
that overlap communication with computation, effectively there is a speed-up
(CM), but there is a limit to the CPU size and number which are dictated
by the problems and interconnection complexity.  Thus there is also
a tradeoff between CPU power and interconnection type.  In retrospect, 
choice of Intel chips for the caltech machine was probably not the best.
Another problem for these network based multi-processors is the initial
distribution of data and final collection.  Almost everybody ignores them in
the analysis, which I think is significant and have to be included.


				      renu raman
				....ihnp4!nucsrl!ram
				Northwestern Univ. Comp. Sci. Res. lab
				   Evanston  IL  60201


Thanks to Ollie, Iran has agreed to spend Zillions in supercomputer research.
How philanthrophic of those guys.

Why is that people who have used iPSC have either turned HYPER or are in a 
COSMIC trance :-)

ram@nucsrl.UUCP (03/07/87)

	While talking about interconnection networks, I just recd my
regular TI semiconductor newsletter.  TI announced  a 32 bit "shuffle
exchange network" on a chip. Called AS8839, it can perform

	   o Perfect Shuffle
	   o Inverse Shuffle
	   o Upper Broadcast
	   o Lower Broadcast
	   o Bit Exchange


	 Anybody know of any other chip(s) for other network(s).



						     renu raman
						     ...ihnp4!nucsrl!ram

Eugene: Its time you changed your ".signature".

ram@nucsrl.UUCP (03/14/87)

 Fay wrote:

>The cost benefits are incredible when one realizes that these programs
>actually get better turn-around time on a $12,000 hypercube than a
				                  ^
						  +- where did a '0' go.
    Last time I heard (yesterday somebody at Argonne told me) it was
    125000 for a d4 machine (d-dimension & 4 is 4).  That price is without
    the vector processors in them.  Could somebody from Intel Clarify.

>batch-processing multi-million-dollar Cray.

    Watch out for cray discounts.  Wait(not too long) for the competition
    to build and see the prices tumbling.

>{linus,talcott,decvax,ihnp4,allegra,necis,compass}!encore!fay
>----------

fay@encore.UUCP (Peter Fay) (03/18/87)

In article <3810018@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes:
>
> Fay wrote:
>
>>The cost benefits are incredible when one realizes that these programs
>>actually get better turn-around time on a $12,000 hypercube than a
>				                  ^
>						  +- where did a '0' go.
>    Last time I heard (yesterday somebody at Argonne told me) it was
>    125000 for a d4 machine (d-dimension & 4 is 4).  That price is without
>    the vector processors in them.  Could somebody from Intel Clarify.

I wasn't refering to Intel's hypercube. In fact I beleive I said Intel's
wasn't the greatest implementation. My rough pricing came from a cube
configuration from Ncube Corp.'s boards for a total of 16 nodes. (The
exact $ price is from memory - it may have gone down).

My only 'real' comparison of cubes was at the ICPP conference
last summer, viewing both the Intel and Ncube running the same 
Mandelbrot program (what else? ). The Ncube was (very roughly) ten times
faster. The Intel people explained this by saying their machine was
still 'experimental', while Ncube's was a commercial product.

Maybe that's why Ncube's is being used in commercial applications.

			- peter fay

{linus,talcott,decvax,ihnp4,allegra,necis,compass}!encore!fay

eugene@pioneer.arpa (Eugene Miya N.) (03/18/87)

In article <1150@encore.UUCP> fay@encore.UUCP (Peter Fay) writes:
>In article <3810018@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes:
>>
>> Fay wrote:
>>
>>>The cost benefits are incredible when one realizes that these programs
>>>actually get better turn-around time on a $12,000 hypercube than a
>>				                  ^
>>						  +- where did a '0' go.
>
>I wasn't refering to Intel's hypercube.

I thought you were refering to a 4 node cube.  In fact I saw John
Palmer's 4 node system when he brought it last year to COMPCON.
Speedups of 4 over micro processors, even 16 over micros are dumb.  You
want speed ups of hundreds of times to compete with larger mainframes
class CPUs which have much faster busses for I/O.

>My only 'real' comparison of cubes was at the ICPP conference
>last summer, viewing both the Intel and Ncube running the same 
>Mandelbrot program (what else? ). The Ncube was (very roughly) ten times
>faster. The Intel people explained this by saying their machine was
>still 'experimental', while Ncube's was a commercial product.
>
>Maybe that's why Ncube's is being used in commercial applications.
>			- peter fay

What "commercial" application is running on a Cube? Any?  I am under the
	send mail when answering this, don't clutter the net
impression (from the Cube conferences) that we are all dealing with
reduced (read toy) problems.  I don't know of a scientist anywhere
running "production code," including Caltech.  Code development sure and
other experiments.  That's quite an investment of a scientists time
to write something which does not have a guarantee that a line of
machines is going to continue (like writing HEP applications).

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

news@cit-vax.Caltech.Edu (Usenet netnews) (03/19/87)

Organization : California Institute of Technology
Keywords: 
From: jon@oddhack.Caltech.Edu (Jon Leech)
Path: oddhack!jon

In article <1150@encore.UUCP> fay@encore.UUCP (Peter Fay) writes:
>My only 'real' comparison of cubes was at the ICPP conference
>last summer, viewing both the Intel and Ncube running the same 
>Mandelbrot program (what else? ). The Ncube was (very roughly) ten times
>faster. The Intel people explained this by saying their machine was
>still 'experimental', while Ncube's was a commercial product.
>
>Maybe that's why Ncube's is being used in commercial applications.

	Based on talking to an NCUBE salesman at the Oak Ridge Hypercube
Conference last September, you can't get enough memory on one of their
nodes (128K, I think) to do the sorts of things I want. Does anyone know 
if this will change? 4 Mb/node seems like a reasonable number to me.
	Other than this major problem I was very impressed by the NCUBE.
Obviously some people can get good use out of them. But commerical 
applications != Mandelbrot sets!
 
    -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon)
    Caltech Computer Science Graphics Group
    __@/

neighorn@qiclab.UUCP (Steven C. Neighorn) (05/11/87)

In article <3810015@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes:
:	While talking about interconnection networks, I just recd my
:regular TI semiconductor newsletter.  TI announced  a 32 bit "shuffle
:exchange network" on a chip. Called AS8839, it can perform
:
:	   o Perfect Shuffle
:	   o Inverse Shuffle
:	   o Upper Broadcast
:	   o Lower Broadcast
:	   o Bit Exchange
:
:	 Anybody know of any other chip(s) for other network(s).
:						     renu raman

I believe shuffle-exchange functions were implemented/discussed by D. H.
Lawrie in the paper "Access and Alignment of Data in an Array Processor,"
IEEE Transactions on Comp., C-24, no. 12, Dec 1975, pages 1145-1155. The
particular implementation the shuffle/exchange functions were used in
was a multistage Omega network.
---
Steven C. Neighorn                tektronix!{psu-cs,reed}!qiclab!neighorn
Portland Public Schools      "Where we train young Star Fighters to defend the
(503) 249-2000 ext 337           frontier against Xur and the Ko-dan Armada"