wjafyfe@watmath.UUCP (Andy Fyfe) (03/04/85)
Several concerns about the cube have been raised by Ian Kaplan & Ron Warnock. Network Routing. The communication channels are set up so that 2 processors communicate via channel i if their binary node id differs in only bit position i. As an example, to go from node 19 to node 42 in a 6-cube, one path is 010011 19 (via channel 0 to) 010010 18 (via channel 3 to) 011010 26 (via channel 4 to) 001010 10 (via channel 5 to) 101010 42 Network routing is easy, and the further a message has to go, the greater the number of shortest routes. As Ian states, processor intervention is required to transfer the message from one channel to another. With some care the re-routing load could be, on average, well distributed. (There is a separate transmit and receive line connecting the node to node ethernet chips, so there is no contention there. There may be contention geting from memory to the chip, though.) Broadcasting. If the host needs to broadcast a message, the global ethernet could be used. Suppose a node wants to broadcast a message. It to could use the global channel. However, I would expect, in most cases, that if one node wants to broadcast a message, all of the nodes would. There are then 2 choices. The broadcast could be serialized, with each processor sending its message to all of the nodes. Alternatively, it is very easy to map the cubee on to a ring (via gray codes) and have each processor broadcast his message in parallel to the next one in the ring, and have messages circulate. In either case, one iteration is required for each node. The latter turns out to be very natural in loop-type algorithms. Software. Software will be a problem, at least for a while. Again, via gray codes, it is very easy to map the cube onto a ring, a 2D mesh, etc., so software designed to run on such a machine can be ported quickly to the cube. Library routines will help. I haven't seen the intel software, so I don't know exactly what will be provided. This machine, like any other, will need alot of useful software if it is to be sucessful (in my opinion, at least). Time will tell. As part of the Waterloop project at Waterloo, some research has been done on loop algorithms for such things as matrix manipulation, monte carlo simulations, etc., and all of these algorithms will run on the cube. Bus Contention. This was raised by Rob Warnock. It's something we're concerned about. There are 8 ethernet (dma based) controllers, and a cpu on the local bus. There is not enough bandwidth for all of them to be active at the same time. How bad the problem will be is another question, though. Again, I expect that the same code will run on all nodes, and will tend to be synchronous (all nodes will be doing the same thing at about the same time) with the synchronization being encouraged by message passing (one node may have to busy wait because it's waiting for a message, for example). And the more message passing that goes on, the less efficient the cube will be. It may be that in practice one typically only has at most 3 of the ethernet channels active at any point in time, one send and receive channel active (for inter-node communication), with the global receive channel accepting a message from the host. I like Rob's hyper-point (alias fat point) though -- it doesn't require some good look. It's worth a serious look (maybe Intel is listening). Of course, we (Waterloop people) would just map the machine onto a loop anyway! :-) ----------------------------------------------------------------------- The cube is not a replacement for a Cray. But multi-processors is the way I believe we have to go. Even on a Cray, to be effiecient, you need to have "vectors" in your algorithm, which is just another way of saying "parallelism". And even the Crays are becoming multi-processor machines. --Andy Fyfe ...!{decvax, allegra, ihnp4, et. al}!watmath!wjafyfe wjafyfe@waterloo.csnet