[comp.arch] Killer Micro Ensembles

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (06/28/91)

In article <1991Jun23.012644.12449@en.ecn.purdue.edu> 
	wailes@en.ecn.purdue.edu (Tom S Wailes) writes:
>Assume you are designing a massively parallel computer system made of
>many commodity microcomputers.  To make this interesting, assume that
>partitioning can be made to allow differing classes of users to coexist on
>one machine.

Actually, that more-or-less follows, since SIMD is the organization
that doesn't want to be timeshared, and an ensemble of commodity
chips is most naturally MIMD. (Actually, MIMDs do "spacesharing.")

You went on to ask about word size. I have been telling people for
some time now that we should leverage commodity technology into
emsemble machines, but I left word size out of the argument.
Instead, my major points were risk avoidance, and bandwagon effect.

Many people have rolled their own instruction set: it's outright
popular. However, any large effort is risky - people leave,
innovative ideas fail to mesh, unexpected demands arrive after it's
too late to make changes, and so on. It's said in Hollywood that no
one actually sets out to make a bad movie. But it happens. A
commodity is the beneficiary, not only of good engineering, but of
luck.

The bandwagon effect isn't just sleazy marketing. If you choose a
chip that some reasonable number of programmers are using, then
there's stuff to buy. For instance, wouldn't it be nice if a good
kernel, and a whole bunch of compilers, had already been ported to
the chip? Is your design so unique that you expect funding to do all
that stuff yourself?

That said, **someone** has to innovate. I'm glad that iWarp and the
J-Machine are being fabbed. Plus, there are compromises, as witness
the Alewife project, which is getting a slightly-modified SPARC from
LSI Logic. Their papers talk about doing thread switches in perhaps
ten clocks, in order to continue computing during the slower cache
faults.

>Do you distribute memory among the processors or do you create a
>large banked shared memory?  

The virtue of a banked shared memory is that memory cards can exist.
People can just buy another card and drop it in. Machines which
distribute the RAM among the PEs (MasPar, CM-2, NCUBE-2) don't tell
as good a story. Mostly, they sell more-memory as more-nodes.

The virtue of distribution is that it reduces the amount of data
motion (for programs with suitable locality).  The further a path
goes, the slower, and the more it costs, because it starts to involve
connectors and drivers. (In a staged interconnect, further ==> each
message occupies more resources.) The unresolved argument here is
whether distributed caches should front for distributed RAM or for
centralized RAM.

>A shared memory would offer better
>utilization in my opinion, but then it would not be local.

First, multiprocessor support seems to be the coming thing. I have a
nice quad 88000 on my desk, the new i860 has MESI logic, and so on.
So, it's now interesting to consider an ensemble of multiprocessor
nodes.  Having N*M RAM chips spread over only N/4 nodes should help
utilization, and boost the aggregate PE interconnect bandwidth.  It
also reduces the non-custom penalty. By this, I mean that an NCUBE-2
node is exactly one chip, plus DRAMs. By using commodity silicon,
which lacks the interconnect support (and possibly other things), we
lose board density. However, if the glue can be amortized over four
CPUs, we may not have lost as badly.

Second, the important kind of sharing is the logical kind. One can
imagine caches that sent messages to each other, for instance.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Robotics Institute