slackey@bbn.com (Stan Lackey) (03/20/90)
In article <45408@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >Speaking of architectural issues, how is the BBN TC 2000 working out? >It should be a perfect example of Killer Micros in action. But, >I was rather surprised that the TC 2000 Butterfly switch is only 8 bits (!) >wide and only supports a maximum memory bandwidth of 2.4 GBytes/sec >for a 63 processor system. A Cray Y-MP has about 40 GBytes/sec of total >memory bandwidth, for reference. The peak bandwidth of the 63-node TC2000 depends upon where you measure it. The memory has a 3-level hierarchy: 1) cache, 2)local memory, and 3)global memory. The Cray has no cache, but the 88000 chip set does; the appropriate place to measure would probably be at the busses between the CPU chip and the cache chips. Combined instruction cache and data cache bussus are a peak of 160 MB/s, times 63 processors is 10 GB/s. Local memory speed is in the neighborhood of 25 MB/s, times 63 or 1.5 GB/s. Global memory is 8 MB/s for an aggregate of 500 MB/s. Your mileage will be somewhere between 10 GB/s and 500 MB/s, depending upon cache hit rate and the mixture of accesses between local and global memory. The 8-bit switch path clocks at 38 MHz, so the raw bandwidth of the media is 38 MB/s. Times 63 paths is peak media speed of 2.4 GB/s. Not to mislead, the above describes more the performance model, with the speed differential between local and global memory. The programming model is a single globally addressed memory space. -Stan
slackey@bbn.com (Stan Lackey) (03/21/90)
In article <53795@bbn.COM> slackey@BBN.COM I responded to a posting comparing TC2000 and Cray memory bandwidths: >The peak bandwidth of the 63-node TC2000 depends upon where you >measure it. The memory has a 3-level hierarchy: 1) cache, 2)local >memory, and 3)global memory. I included a set of approximate peak bandwidths at the various levels, commenting on what I felt was an apples-to-oranges comparison with the Cray. I erroneously left out the disclaimer: These are approximate peak values given for comparison with other architectures only. Although these values can be achieved under certain circumstances, delivered averages will vary depending upon the application. -Stan
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (03/21/90)
In article <53795@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >In article <45408@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >>Speaking of architectural issues, how is the BBN TC 2000 working out? >The peak bandwidth of the 63-node TC2000 depends upon where you >measure it. I agree. Of course, Crays have no caches, but some Crays have local memory and all Crays have vector registers and fairly numerous scalar registers. You could call registers "programmable caches" to compare bandwidths :-) My question was intentionally brief, but to be more specific: the architecture obviously depends on the ability to parallelize in such a way that global memory bandwidth is not the bottleneck. How well is this working out? etc. etc. etc. Hugh LaMaster, M/S 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)604-6117
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (03/21/90)
In article <45490@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >I agree. Of course, Crays have no caches, but some Crays have local memory " " " " ^ DATA caches ^ I should have said. Pardon. Hugh LaMaster, M/S 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)604-6117
crowl@cs.rochester.edu (Lawrence Crowl) (03/23/90)
In article <45490@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >My question was intentionally brief, but to be more specific: the [BBN TC >2000 Multiprocessor] architecture obviously depends on the ability to >parallelize in such a way that global memory bandwidth is not the bottleneck. >How well is this working out? My experience has been with the first Butterfly, based on the 68000. On this system, contention for the "inter-node" communication network was negligible. You are far more likely to limit performance because of contention for a specific memory module than the communication network. I expect (but do not know) that the same is true for the TC 2000. -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{ames,rutgers}!rochester!crowl Rochester, New York, 14627