[net.micro.68k] Multiple 68020's on VME ?

dan@rna.UUCP (Dan Ts'o) (09/27/85)

	We would like to hear from people who know about or who have
used multiple 680XX's on a bus. We are tentatively considering a ~10
processor machine using 68020's on VME for a particular real-time
data collection, analysis and display application.

	I know that some people at Calgary have a similar project,
using the Harmony OS. Anybody know more about that project ?

	Is such a machine really easy to build with off-the-shelf
boards ? I would assume that each processor's instruction space should
reside in local memory and preferably most of its data requirements as
well. What is the practical VME memory bandwidth of a typical VME system
using standard memory boards and backplanes ?

	Thanks.

					Cheers,
					Dan Ts'o
					Dept. Neurobiology
					Rockefeller Univ.
					1230 York Ave.
					NY, NY 10021
					212-570-7671
					...cmcl2!rna!dan
					rna!dan@cmcl2.arpa

cleary@calgary.UUCP (John Cleary) (09/30/85)

> 
> 	We would like to hear from people who know about or who have
> used multiple 680XX's on a bus. ..

> 	I know that some people at Calgary have a similar project,
> using the Harmony OS. Anybody know more about that project ?
> 

I am at Calgary and yes we have a multiprocessor 68000 system going -
called Calgary Mesh Machine - CM^2.
It is a mesh connected torus with each machine connected to 4 nearest 
neighbours. Each machine has an independent clock.  Communication between
neighbours is via a 4K block of dual ported memory.  This last allows
very high speed transfer without interrupting the destination processor
until the entire message is there.  Currently we use the shared memory as
a fast message passing device but are thinking hard about how to use it
for a more flexible concurrent prolog implementation.
We use a home grown kernel with Thoth like message passing between processes
-- this is the JADE system inhereted from a distributed systems and monitoring
project here at Calgary.
(It knows about Unix and can use Unix I/O via a host).
Current main applications (still being worked on) are ray tracing
and timewarp based simulation.

> 	Is such a machine really easy to build with off-the-shelf
> boards ? I would assume that each processor's instruction space should
> reside in local memory and preferably most of its data requirements as
> well. What is the practical VME memory bandwidth of a typical VME system
> using standard memory boards and backplanes ?

It was easy to build.  Current component costs approx $800/board.
We have a 3x3 prototype array almost going and have had a 2x2 array going for
some time.  We designed the boards and did our own artwork ourselves.
Off the shelf boards are expensive and tend to have a lot othings that
are irrelevant in a very simple environment such as this.
The current hardware configuration per board is:
	68010 - 8MHz clock
	512KB local RAM
	2x4K dual ported RAM 
		plus two off board connectors to RAM on other boards
	2xRS232C connectors
	timer and interrupts from each of four neighbours.

	One of the boards has a 1Mb/sec net connection (omninnet).
If I ever get money to build the mark II it will have more memory,
floating point support, and an ethernet connection.

	John G. Cleary,
	Dept. Computer Science,
	The University of Calgary,
	2500 University Dr.,
	N.W. Calgary,
	Alberta,
	CANADA T2N 1N4.

	Ph. (403)284-6015
	Usenet: ...{ubc-vision,ihnp4}!alberta!calgary!cleary
	CRNET (Canadian Research Net): cleary@calgary

baba@spar.UUCP (Baba ROM DOS) (09/30/85)

> 
> 	We would like to hear from people who know about or who have
> used multiple 680XX's on a bus. We are tentatively considering a ~10
> processor machine using 68020's on VME for a particular real-time
> data collection, analysis and display application.
> 
> 
> 					Cheers,
> 					Dan Ts'o

The August 1, 1985 issue of Computer Design contains a rather interesting
article on metastability problems in multiprocessor VME systems.  It appears
to be a little trickier than expected to design VME bus and memory arbitration 
logic that is not vulnerable to metastability problems in synchronization.  In
particular, some folks at CMU's robotics lab discovered that "as few as two"
8-MHz Motorola VM02 68000 boards would lock up "within 4 to 10 minutes".

						Baba ROM DOS

jbn@wdl1.UUCP (10/01/85)

      We have a system with several bus masters on one VMEbus, and ran into
the following problems:

	1)  The Omnibyte CPU card incorrectly performed its slot 1 bus
	    arbitration function, due to a problem with the Motorola
	    bus arbiter chip.  Omnibyte replaced the chip with a small
	    daughter board with two chips, which fixed the problem.

	2)  The Ironics RAM card was discovered to raise DTACK before it
	    was done with the data; as long as the cycle came from a
	    M68000 this was OK, because the M68000 kept the lines up just
	    long enough for it to work, but our own DMA peripheral didn't
	    and the RAM board would write bad parity into memory.  No fix;
	    we've switched to DY-4 RAM cards.

Moral: ask your board vendor ``Have you run this thing with multiple bus
masters?''  Both these cards work beautifully until you put on the second
bus master.

				John Nagle

witters@fluke.UUCP (John Witters) (10/01/85)

> 
> 	We would like to hear from people who know about or who have
> used multiple 680XX's on a bus. We are tentatively considering a ~10
> processor machine using 68020's on VME for a particular real-time
> data collection, analysis and display application.
> 

I'd suggest reading the August 1st 1985 issue of Computer Design before you
rush off and do this.  The article of interest is titled "Metastability haunts
VME bus and Multibus II system designers" on page 29.  I'll quote the relevant
section below.  I haven't looked too closely at this, but it seems to me that
the interrupt daisy chain scheme should suffer from the same problem.  If you
can't overcome the metastability problems, maybe you could loosely couple the
processors using the VMSbus instead of the VMEbus.  The VMSbus is a synchronous
high speed serial bus, so by definition you shouldn't have metastability
problems.  Another solution is to use a different bus request level for each
board in your system, but this limits one to only four bus masters since the
VMEbus has only four bus request levels.

Multiprocessor system fails

An early victim of metastability in a VMEbus product was John Willis, head of
the Rapid Bus multiprocessor project at Carnegie Mellon's Robotics Laboratory
(Pittsburg, PA).  Willis had planned to use Motorola's VM02 cards as
68000-based CPU nodes in a multiprocessor design, but eventually discarded the
VM02 parts.  According to Willis, a VMEbus-based system using as few as two
8-MHz VM02 cards would lock up within 4 to 10 minutes.  The Motorola
specification states that up to 16 VM02 boards can operate reliably in a
multimaster configuration.

Willis was able to isolate the failure to the VM02 card's dual-port arbiter,
bus request arbiter, and bus requester.  His biggest source of trouble was the
dual-port arbiter, which controls access to each VM02 card's dual-port memory.
The dual-port arbiter decides whether the VM02's onboard 68000 processor will
access memory, or whether a processor on another board will access the memory
via the VMEbus.

Metastability problems arose in the arbiter's synchronization device, which was
responsible for synchronizing the two bus-request sources with its own clock.
(The arbiter's clock is synchronous with the 68000's clock.)  Because the
arbiter makes its arbitration decisions in about 20ns, the output of its
synchronizer has only 20 ns to settle to a stable state, but needs at least 50
ns to ensure reliable operation.  The VM02's bus-grant arbiter proved unreliable
for the same reason -- it was trying to make dual-port arbitration decisions in
only 20 ns.

The bus requester's function is to issue bus requests and sample the bus-grant
signal.  Each master on the VMEbus has a requester, and the requesters are
arranged in a daisy chain fashion, such that the bus-grant signal issued by the
arbiter passes serially through each requester.  Each requester decides whether
to intercept the bus-grant signal and access the bus, or pass the signal on to
the next requester on the daisy chain.  If the master associated with a
particular requester has a request pending, the requester will intercept the
bus-grant signal.  If the master has made no request, it will pass the grant
signal along to the next requester.

This scheme is unreliable, Willis claims, because of the asynchronous
relationships between the bus-grant and bus-request signals.  He points out
that the device in each requester that synchonizes bus grants and bus requests
is not allowed enough time to settle.  If the rising edge of a request signal
at one synchonizer's input coincides with the rising edge of a grant signal at
the other synchronizer's input, the requester will behave unpredictably, and
may grant the bus to two masters at once.

Support for Wills' argument comes from Dave Barr of Indocomp (Drayton Plains,
MI), the designer of that company's line of VMEbus-based multiprocessor
systems.  During a test in which two masters on the VMEbus periodically wrote
data to a global memory (also on the VMEbus), Barr found that the bus
collisions between requesting masters caused incorrect data to be written to
memory.  To avoid bus-collision problems on the VMEbus, Barr scrapped the
VMEbus' arbitration strategy in favor of a synchronous arbitration scheme
which uses a single 4-MHz clock to coordinate bus requests and bus grants.
This solution places an upper limit on arbitration speed, but guarantees
system reliability.

Barr could have modified the VMEbus' arbitration logic, but admits that his
lack of familiarity with the problem of metastability led him to avoid it
altogether.  A potentially simple fix was not implemented because of a lack of
understanding.
							Ken Marrin
							Senior Editor
-- 

						John Witters
						John Fluke Mfg. Co.  Inc.
						P.O.B. C9090 M/S 243F
						Everett, Washington  98206

						(206) 356-5274

jack@boring.UUCP (10/04/85)

I think that putting multiple CPU's on a VME bus isn't going to
do you a lot of good, unless you either have a lot of local
memory, or a very large cache.

Also, the bus master arbitration of the VME bus is very poor, I think.
Although it is a nice and quite general scheme, performance looks
awful.
It's OK for a DMA device wanting to do a block transfer, but if you
are in a system with 4-8 CPU's and you're all trying to execute
off the bus, you'll probably spend most of your time arbitrating.

At the HTS"A", we're working on a project to put multiple 32016 (with
a lot of cache) on the same VME bus.
What we did is abuse on of the BREQ/BG pairs for bus arbitration:
If you want to do a bus request, you wait 'till you get the BG bit.
Then, you do your request, and pass the BG on to your neighbour/.
If you don't want to do anything, you pass the bit on immedeately.
The CPU that's having the BG at the moment pulls down BREQ, so as soon
as BG 'falls out' of the cardcage, you notice it and generate a new
BG at the beginning of the bus.

This gives a very efficient and fair scheme if you have a reasonable
number of bus masters, who are expected to do large numbers of small
transfers, in stead of small numbers of large ones.
Also, you can still use ordinary VME boards.
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

fred@mot.UUCP (Fred Christiansen) (10/04/85)

> The August 1, 1985 issue of Computer Design contains a rather interesting
> article on metastability problems in multiprocessor VME systems.  It appears
> to be a little trickier than expected to design VME bus and memory arbitration
> logic that is not vulnerable to metastability problems in synchronization.  In
> particular, some folks at CMU's robotics lab discovered that "as few as two"
> 8-MHz Motorola VM02 68000 boards would lock up "within 4 to 10 minutes".
> 
> 						Baba ROM DOS

I read the article a found the line of reasoning a little weak in places.
VME was being blamed, yet the example cited used VM02 boards which are Versabus
not VMEbus based.  So, I dropped in on a hardware sharpie and asked what the
scoop really was.  The gist of his statement was that the problem was one
found in all the popular busses, although more particularly the asynch busses.
HW designers are aware of the problem and have been successful in working
around it.
-- 
<< Generic disclaimer >>
Fred Christiansen ("Canajun, eh?") @ Motorola Microsystems, Tempe, AZ
UUCP:  {seismo!terak, trwrb!flkvax, utzoo!mnetor, ihnp4!btlunix}!mot!fred
ARPA:  oakhill!mot!fred@ut-sally.ARPA             Telephone:  +1 602-438-3472

jxw@fas.ri.cmu.edu.ARPA (John Willis) (10/05/85)

	The reference in Computer Design, August 1 to VMEbus problems
is (loosely) based on problems we experienced several years ago using
Motorola's VM02 cards on a Versabus.  We bought the first and second
cards off the production line for use in prototyping a very early
version of our RapidBus multiprocessors.

	Despite these cards being recommended for use in a multiple
processor configuration, I do not believe that Motorola Microsystems
actually tried to use two in the same backplane for many months after
their release.  The designers had ignored the asynchronous relationship
between local and Versabus requests to the dual port arbiter, resulting
in frequent maloperation (4 to 20 minutes under heavy load).

	It required nearly two years, and several very pointed questions
to get Motorola to produce an ECO.  As far as I know, later versions of
the card increased the arbiter resolution time up to fifty nanoseconds,
dramatically improving reliability.  We have since switched to processor
cards from IBM Instruments (CS-9000) and BioResearch for later and much
larger prototypes.

	In talking with Ken Marrin for the article, we tried to point
out the importance of recognizing and intelligently handling each of the
many asynchronous interfaces designed into both VMEbus and Versabus, with
Motorola as one example.  The VM02 is only one of several cards we have
run into with problems correctly handling the asynchronous interface is
MSI.

	It is disappointing that Motorola has not coupled their support
for asynchronous bus design with responsible literature helping designers
to handle asynchronous interface designs correctly.  I believe that many
of their claimed performance figures for VMEbus ignore synchronizer
resolution delays, resulting in either unreliable or slower systems.

	VMEBus multiprocessors can be built reliabably, but the bus
specifications don't tell you everything you need to know.  For further
information, I urge you to read some of the good papers coming out of
the Washington University Asynchronous Systems Group (T. Chaney et al),
or a very practical article by Stoll at Intel in VLSI Design several
years ago.

					-John