sundar@cwruecmp.UUCP (05/03/83)
Summary of the responses for the Systolic Array on a Chip Query: Thanks to all of you for responding to my query. Frankly speaking there weren't very many responses. I am sure there are more people working in this area than I am forced to believe by counting the number of responses. Here I have attempted to summarize the responses. I tried to find out if any one had implemented the idea of systolic arrays on a chip and if yes, what problems they had. At MIT, as part of the 6.371 project, a systolic Stack/Queue was designed and fabricated. However, these haven't been tested and I have no idea of the problems they encountered. Also, a systolic priority queue was designed last year. It is not known if this has been fabricated and tested. There are a couple of groups working on using the concept of systolic arrays to connect many chips externally (as opposed many components within a chip). There is also some interest in chips that are specially designed to facilitate such an external interconnection. TRW makes an FIR (Finite Impulse Response Filter) chip called the TDC1028. It is not known if it is systolic. The NYU Ultracomputer uses a systolic structure that performs 2x2 switching, queuing messages in a congested network and combining messages destined for a memory address. The implementation is yet to be done. The interested viewer is referred to Feb 83 issue of IEEE Transactions on Computers. At CMU, 88 pin chips have been fabricated that contain 64x64 microstore, some random logic and RAM. I am told that they had a 100% failure rate after the first fabrication attempt. However, all errors seem to have been detected an corrected, and a second fabrication order to Mosis has already been sent. The idea is to use these chips in a systolic array for solving partial differential equations. The University of Waterloo has developed a Chess Legal Move Generator using the concept of systolic arrays. A description of the chip can be found in the proceedings of the 3rd Caltech Conference on VLSI (1983). A more readable version of the paper can be found in the May-June 1983 issue of VLSI Design. Though there is a lot of good work that has been reported in these responses, I feel that my original question is still left unanswered. I was (still am) looking for problems at the implementation stage (e.g. the well known global clock distribution) and design alternatives to correct them (e.g. use of asynchronous modules in the chip, tree distribution of the clock, etc) that have actually been proven to be feasible in practice. If you would like to contribute to this list, please mail me the information. I will post a second summary to the net. Sundararavarathan R Iyengar (sundar) Case Western Reserve University decvax!cwruecmp!sundar, sundar.Case@UDEL-RELAY