sundar@cwruecmp.UUCP (05/03/83)
Summary of the responses for the Systolic Array on a Chip Query:
Thanks to all of you for responding to my query. Frankly
speaking there weren't very many responses. I am sure there
are more people working in this area than I am forced to
believe by counting the number of responses. Here I have
attempted to summarize the responses.
I tried to find out if any one had implemented the idea of
systolic arrays on a chip and if yes, what problems they had.
At MIT, as part of the 6.371 project, a systolic Stack/Queue
was designed and fabricated. However, these haven't been tested
and I have no idea of the problems they encountered. Also,
a systolic priority queue was designed last year. It is
not known if this has been fabricated and tested.
There are a couple of groups working on using the concept of
systolic arrays to connect many chips externally (as opposed
many components within a chip). There is also some interest
in chips that are specially designed to facilitate such
an external interconnection.
TRW makes an FIR (Finite Impulse Response Filter) chip called
the TDC1028. It is not known if it is systolic.
The NYU Ultracomputer uses a systolic structure that performs
2x2 switching, queuing messages in a congested network and
combining messages destined for a memory address.
The implementation is yet to be done. The interested viewer
is referred to Feb 83 issue of IEEE Transactions on Computers.
At CMU, 88 pin chips have been fabricated that contain 64x64
microstore, some random logic and RAM. I am told that they had
a 100% failure rate after the first fabrication attempt.
However, all errors seem to have been detected an corrected,
and a second fabrication order to Mosis has already been sent.
The idea is to use these chips in a systolic array for solving
partial differential equations.
The University of Waterloo has developed a Chess Legal Move
Generator using the concept of systolic arrays.
A description of the chip can be found in the proceedings of the
3rd Caltech Conference on VLSI (1983). A more readable version
of the paper can be found in the May-June 1983 issue of VLSI Design.
Though there is a lot of good work that has been reported in these
responses, I feel that my original question is still left
unanswered. I was (still am) looking for problems at the
implementation stage (e.g. the well known global clock distribution)
and design alternatives to correct them (e.g. use of asynchronous
modules in the chip, tree distribution of the clock, etc) that
have actually been proven to be feasible in practice.
If you would like to contribute to this list, please mail me
the information. I will post a second summary to the net.
Sundararavarathan R Iyengar (sundar)
Case Western Reserve University
decvax!cwruecmp!sundar, sundar.Case@UDEL-RELAY