G.L.Gould@durham.ac.uk (03/22/91)
In response to Buff Whelan's comments concerning transputer\DSP hybrids:
* I feel that it is more of a case of what the transputer and dsp can do
for each other. Obviously, the dsp can easily outperform the transputer
in applications that involve multiplication operations etc, and so it would
be useful as a specialised 'accelerator' for the transputer. The transputer,
on the other hand, offers the dsp a simple means of co-operating in a
parallel system, without the need for extra control hardware, bus arbitration
etc. Other transputer based modules may also be easily utilised by the dsp -
graphics boards, disk controllers, image capture, adc/dac boards.
* Many dsp applications/algorithms may easily be ported to parallel
environments, especially the more complex applications.
* It is true that, say, if the dsp was performing some filtering operation
then the transputer may not be able to cope with the processing speed, but
that is not important. It just needs to be able to match the io bandwidth
of the dsp. The transputer is quite capable of keeping up with most dsps,
whilst leaving enough time to do something else - which is, after all, good
parallel programming practice.
* The main problem with the transputer/dsp hybrid systems that I have seen
is the method of inter-processor communication. They use low bandwidth
methods such as the transputer links or other byte wide transfer methods.
Some that I have seen do use shared memory, but only single ported. I readily
admit that I certainly have not done an exhaustive study, but isn't there
anyone using dual - ported memory?
Here at Durham we are in the process of constructing a transputer/dsp hybrid
system. The system is built from modules, each consisting of a single T801
that acts as a master to any number of dsp slaves. The dsp chip is the
DSP56001, simply because it suits are purpose and it is cheap - any dsp
could be used in the system.
Each dsp possesses its own block of dual-ported RAM (DPR). The transputer
is connected to each DPR block. Data is transferred through the DPR, data
integrity is assured by the use of semaphores.
Since the transputer controls tye flow of data around the dsp network, the
dsp topology is effectively software defined. The network may be reconfigured
'on the fly'. Furthermore, program code may also be downloaded from the
transputer, agaian, 'on the fly'. Thus the dsp network is dynamically
reprogrammable and reconfigurable. These properties are advantages in large
scale, multi-algorithmic signal processing systems.
Although the transputer was not designed to communicate over its EMI,
the initial problems encountered, which impaired performance, have been
solved and the transputer is now capable of accessing and testing the
semaphore in the same number of machine cycles as the dsp. This is due to
a fair amount of somewhat tricky assembly coding.
we have shown that the T801 can cope with six 56001 devices at full
memory bandwidth. This, however, is assuming that each dsp is running a
single biquad filter section - one of the simplest dsp functions - which
represents a gross underuse of resources. For any realistically sized code
segment running on the dsps, the number of 56001s supported by the T801
would increase accordingly.
It should be stressed that each dsp runs at full speed, all the time.
Communication overhead for the dsp is negligable.
Work is also being carried out on the task allocation of such a system.
Hope this has been some help. For further information, contact me or see
the proceedings of ICASSP '91.
Was there really a need for the boat-car ?
Lee Gould
SEAS, University of Durham, Durham, DH1 3LE, England.
G L Gould @ uk.ac.dur.mts
usual disclaimers