lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (06/28/91)
In article <1991Jun23.012644.12449@en.ecn.purdue.edu> wailes@en.ecn.purdue.edu (Tom S Wailes) writes: >Assume you are designing a massively parallel computer system made of >many commodity microcomputers. To make this interesting, assume that >partitioning can be made to allow differing classes of users to coexist on >one machine. Actually, that more-or-less follows, since SIMD is the organization that doesn't want to be timeshared, and an ensemble of commodity chips is most naturally MIMD. (Actually, MIMDs do "spacesharing.") You went on to ask about word size. I have been telling people for some time now that we should leverage commodity technology into emsemble machines, but I left word size out of the argument. Instead, my major points were risk avoidance, and bandwagon effect. Many people have rolled their own instruction set: it's outright popular. However, any large effort is risky - people leave, innovative ideas fail to mesh, unexpected demands arrive after it's too late to make changes, and so on. It's said in Hollywood that no one actually sets out to make a bad movie. But it happens. A commodity is the beneficiary, not only of good engineering, but of luck. The bandwagon effect isn't just sleazy marketing. If you choose a chip that some reasonable number of programmers are using, then there's stuff to buy. For instance, wouldn't it be nice if a good kernel, and a whole bunch of compilers, had already been ported to the chip? Is your design so unique that you expect funding to do all that stuff yourself? That said, **someone** has to innovate. I'm glad that iWarp and the J-Machine are being fabbed. Plus, there are compromises, as witness the Alewife project, which is getting a slightly-modified SPARC from LSI Logic. Their papers talk about doing thread switches in perhaps ten clocks, in order to continue computing during the slower cache faults. >Do you distribute memory among the processors or do you create a >large banked shared memory? The virtue of a banked shared memory is that memory cards can exist. People can just buy another card and drop it in. Machines which distribute the RAM among the PEs (MasPar, CM-2, NCUBE-2) don't tell as good a story. Mostly, they sell more-memory as more-nodes. The virtue of distribution is that it reduces the amount of data motion (for programs with suitable locality). The further a path goes, the slower, and the more it costs, because it starts to involve connectors and drivers. (In a staged interconnect, further ==> each message occupies more resources.) The unresolved argument here is whether distributed caches should front for distributed RAM or for centralized RAM. >A shared memory would offer better >utilization in my opinion, but then it would not be local. First, multiprocessor support seems to be the coming thing. I have a nice quad 88000 on my desk, the new i860 has MESI logic, and so on. So, it's now interesting to consider an ensemble of multiprocessor nodes. Having N*M RAM chips spread over only N/4 nodes should help utilization, and boost the aggregate PE interconnect bandwidth. It also reduces the non-custom penalty. By this, I mean that an NCUBE-2 node is exactly one chip, plus DRAMs. By using commodity silicon, which lacks the interconnect support (and possibly other things), we lose board density. However, if the glue can be amortized over four CPUs, we may not have lost as badly. Second, the important kind of sharing is the logical kind. One can imagine caches that sent messages to each other, for instance. -- Don D.C.Lindsay Carnegie Mellon Robotics Institute