[comp.parallel] Shared Memory over a Bus?

folta@tove.cs.umd.edu (Wayne Folta) (11/21/90)

I am not a hardware guy, but I have been asked to make some suggestions
regarding a multi-processor system's design. This system will be pulling
in data at a high rate, and it should allow for many processors to work
on this data at once (each processor independent of the others). It also
should use off-the-shelf components, and it must be fairly rugged and
small (portable).

So... I have come up with a crazy idea, using a NuBus, say, to act as a
broadcasting medium, broadcasting the incoming data to multiple CPUs.
Could anyone please comment on the feasibility of doing this:

1. There is 36Mbyte/sec of incoming data. It will be read by a CPU, plugged
   into, say, a NuBus. It will write this data onto the bus.
2. All of the other processors on the bus will have their own memory, but
   all of the memories will have the same address space. Thus, one write to
   the bus would "copy" the data to N CPU's memories at once(?). The CPUs will
   read/write only their own local memory, so the address collisions won't
   matter--no one will attempt to read across the bus.

* Is this possible?
* Is NuBus (or other off-the-shelf bus) fast enough for, say 50Mbytes/sec of
  throughput?
* Would multiple memory boards at the same address create a mirrored-memory
  effect like I want?
* At these speeds, how many boards could I fit on the bus?
* Could the CPUs be reading their memories while the broadcasting CPU is writing
  to them? (I have heard of "dual-ported" memory. Would this do it? If so, does
  it come as fast as 25ns?)
* Would it be relatively easy for a CPU to disable its local memory, so that it
  ignores the bus temporarily? (This would allow more leisurely processing of
  some data.)
* What if I wanted to have each CPU's memory divided in two: one part shared
  as above, and one part with a unique address, for communication? Is this
  possible?
* Is there a much better way to do what I want to do?

Thanks for your help on a crazy idea.
--


Wayne Folta          (folta@cs.umd.edu  128.8.128.8)

wangjw@usceast.cs.scarolina.edu (Jingwen Wang) (11/28/90)

Dear Mr. Folta,
   Your intuition is correct. In fact we had built a 8-processor multiprocessor
with a similar architecture in China using the TMS 320C25. The Broadcast bus
is a 16-bit parallel bus link all processors' communication memoreis. The
difference is that in our system each processor can broadcasts messages to
all the others. It is thus designed to meet the communication requirements
of continuous system simulation applications. Each processor uses a dual-
port memory as the communication memory attached to the bus.
   We have made simulations for this system and the results indicated
very attractive performance over a shared global memory architecture.
The system has a PC-AT computer as the front end host together with a
graphics terminal for dynamic visual display.
   Although your basic idea is wonderful, there is still problem to meet
the time constraints on your system. A 36Mbytes/sec in-coming data rate
can hardly be manageble by even the fastest processors to-date. You can
not simply execute that many instructions per second. Even if a DMA
mode transmission is used, it is still a headache.
   The design would not be a trifle one since high data rates will incur
lots of reliability problems.
   Only for your reference. Hope it helps a little.

Jingwen Wang

Department of Electrical & Computer Engineering
University of South Carolina
Columbia, SC 29208