[comp.arch] VME Bus Standard

dakramer@phoenix.Princeton.EDU (David Anthony Kramer) (11/28/89)

I am looking to references on the VME bus and board standard. Firstly what 
is the official (IEEE?) standard? Secondly are there any books/papers which 
detail the standard, but are a little more readable than the official 
standard documentation.

I will post a summary to the net if there is sufficient interest.

Thanks in advance
David

David Kramer
Department of Electrical Engineering
Princeton University
Princeton, NJ 08544

Internet: dakramer@olympus.princeton.edu

linimon@attctc.Dallas.TX.US (Mark Linimon) (11/28/89)

In article <11759@phoenix.Princeton.EDU> dakramer@phoenix.Princeton.EDU (David Anthony Kramer) writes:
>I am looking to references on the VME bus and board standard. Firstly what 
>is the official (IEEE?) standard? Secondly are there any books/papers which 
>detail the standard, but are a little more readable than the official 
>standard documentation.

VMEbus has its own non-profit trade group: VITA, the VMEbus International
Trade Association.  (10229 N. Scottsdale Road, Suite E, Scottsdale AZ 85253
602-951-8866; P.O. Box 192, 5300 AD Zaltbommel, The Netherlands 31.4180.14661).
They offer both the reference specifications and vendor guides:
   VMEbus Revision C.1
   VMSbus Revision C.2
   VMXbus Revision B
   VSBbus Revision C
   VMEbus Compatible Products Directory (twice yearly, a recent one was 300 pp+)
   VMEbus Software Source Directory (somewhat oriented to the European market;
          the edition I have may only be available in Europe)

(Disclaimer: I work for a VMEbus manufacturer but otherwise have no connection
 with VITA).

Copies of the VMEbus specification are also usually available from VMEbus
manufacturers; in particular Mizar and Motorola.  The Motorola book is
published by Micrology pbt, 2618 S. Shannon, Tempe, Arizona, 85282,
602-966-5936.

VMEbus is also known as IEC 821 bus and IEEE standard P1014/D1.2; see below
for the adress of IEEE.  However, there may be slight changes in these
standards versus the C.1 spec.

David Pointer (dpointer@uicsrd.csrd.uiuc.edu) contributed the following:
------------------------------------------------------------------
I pulled this info from the 1989 IEEE Publications catalog:

ANSI/IEEE Std 1014-1987, Standard for a Versatile Backplane Bus:
VMEbus
Order code: SH11544   List: $42.00  IEEE Members: $21.00

It looks like credit card orders can be placed with:

     IEEE Service Center
     Cash Processing Sales Dept.
     445 Hoes Lane
     P.O. Box 1331
     Piscataway, NJ  08855-1331
     (201) 562-5346

The IEEE can be reached at 345 East 47th Street, New York, NY 10017 USA.
Sorry, I don't have their phone number handy.

------------------------------------------------------------------

I've seen an "Introduction to VMEbus" book but don't have a pointer
to it.

Disclaimer: I believe all this information to be accurate.  However, if anyone
who has more up-to-date information will email to me I will post an
updated message.

Mark Linimon
Mizar, Inc.
linimon@mizarvme
{attctc, sun!texsun, convex, texbell}!mizarvme!linimon

afgg6490@uxa.cso.uiuc.edu (11/29/89)

IEEE Standard for a Versatile Backplane Bus: VMEbus
ANSI/IEEE Std 1014-1987
Published by the Institute of Electrical and Electronics Engineers, Inc.
	345 East 47th Street, New York, NY 10017 USA
Distributed in cooperation with Wiley-Interscience, 
	a division of John Wiley & Sons, Inc.
ISBN 0-471-61601-X
Library of Congress Catalog Number 87-46413


I believe that Motorola distributes a version, possibly more recent than
what I finally sent off for, as part of their technical publications.

You'd probably want a VSBbus standard as well.


I know of no "easy" books on the VMEbus,
but the standard isn't so difficult to read.
Reading a few other bus standards at the same time
for comparison helps.
Also try to get some tech info for the VME bus controller chips
(Motorola's, Force's, or the recent VIC chip)
as well as typical boards - since, of all the VME features,
many systems do not stretch them to the limit,
so it's more a question of what is done rather than what is permitted.

afgg6490@uxa.cso.uiuc.edu (11/29/89)

A while back M. Faiman of the UIUC said something to me
that crystallized my feelings on the subject of busses:

	It's time we got some RISC busses.

I'd like to start a conversation string on this.

Topics:
	What (CISCy) features are there in existing busses that
	could be eliminated?
		VMEbus?
		Multibus?
		FUTUREbus?
	What is the set of "good ideas" that should be in new busses?

alvitar@weasel.austin.ibm.com (Phillip L. Harbison) (11/30/89)

In article <112400007@uxa.cso.uiuc.edu>, afgg6490@uxa.cso.uiuc.edu writes:
> A while back M. Faiman of the UIUC said something to me
> that crystallized my feelings on the subject of busses:
> 	It's time we got some RISC busses.
> I'd like to start a conversation string on this.
> Topics:
> 	What (CISCy) features are there in existing busses that
> 	could be eliminated?

[1] No justified data busses!

A justified data bus requires byte shuffling such that a byte transfer
always takes place on the first byte path and a word transfer takes place
on the first two byte paths.  I suppose the same concept can be extended
to 32-bit transfers over a 64-bit bus.  Multibus I, Multibus II, and most
PC busses use a justified data bus.

An unjustified data bus always transfers bytes within a "stripe" of memory
space over the same byte path.  For example, on a 32-bit unjustified data
bus, the byte path is the address anded with 011 (binary).  Therefore, all
bytes ending in xxx00 transfer over byte path 0, xxxx01 over byte path 1,
etc.  Only one transceiver is required for each byte path.  Nubus uses an
unjustified data bus, and I believe Futurebus does too.  VME is inconsistent
since it uses unjustified transfers on the 16-bit bus but justified transfers
on the 32-bit bus.

These transceivers use alot of board space (especially on tiny Eurocards),
waste power, and add extra delay.  The following table shows the number of
transceivers required to implement justified and unjustified busses.

          Number of Transceivers
Bus Size  Justified  Unjustified  Transfer Sizes
--------  ---------  -----------  --------------
  8-bit       1           1        8 bits
 16-bit       3           2        8, 16 bits
 32-bit       8           4        8, 16, 32 bits
 64-bit      20           8        8, 16, 32, 64 bits

Keep in mind that this circuitry must be repeated on every card.  For a
16 card system using a 32-bit justified data bus, that's 64 extra chips!
Of course the shuffle network has to be implemented somewhere, but why
not do it on the CPU card? Most modern micros implement this on the chip
anyway.  (at the the 68020, 68030, 88200, 386, and probably many more)

[2] Get rid of daisy-chained signal lines!

I'm talking about signals that go into a board on one pin and out on some
other pin.  In other words, a board must occupy every slot or you must
have a jumper connection to maintain continuity.  Not only is this a con-
figuration hassle for the user, but it makes it difficult to implement hot
card replacement (removing or inserting cards with power on).  This is very
important in many fault-tolerant systems.

[3] Get rid of centralized resource managers.

Picture this: your $5000 CPU card, $4000 DRAM card, $2000 disk controller
and $2000 serial port controller are all working fine; however, because a
$500 system controller card is broken, the whole system is useless.  Most
older busses used centralized bus arbiters.  NuBus, FutureBus, and Multi-
Bus2 all use distributed arbiters.

> 	What is the set of "good ideas" that should be in new busses?

[1] Geographic Addressing

Have some pins on each connector that identify the slot number.  The board
can use this information to select an address range.  Given that each board
can be uniquely addressed automatically, it is possible to design systems
with few or no jumpers and DIP switches.  The system configuration software
can probe each slot to determine which resources are present, then install
the appropriate drivers, kernel options, device inodes, etc.   This feature
is supported by NuBus, FutureBus, and MultiBus2.

[2] Try-Again Signal

There should be a signal or combination of signals that indicate the pending
bus cycle should be retried at a later time.  This provides a hook for some
cache consistency protocols, and makes it easier to implement bus couplers
between two busses.  NuBus has this, and I believe FutureBus and Multibus2
also have it.  This is a glaring deficiency of VME bus since the only option
is to complete a bus cycle or issue a bus error.  The latter is a severe
reaction to what may be a temporary resource conflict.

[3] Cache Consistency (or Coherency)

This is a must for multi-processor systems, or even single processor systems
where there are other bus masters (DMA devices) and the processor has cache.
Support should be provided for snooping the bus, preempting a cycle, and
performing a write-back operation.  FutureBus has already done a great job
defining a family of consistency protocols.  NuBus has most of the hooks,
but last I heard, no standardized protocol.

[4] Pin & Socket Connectors

Use decent connectors, like the DIN-41612 connectors used in VME, NuBus,
FutureBus, and MultiBus2.  I hate the edge connectors used in MultiBus I
and the PC.  Those infernal EISA multi-level connectors are even worse:
all the cost of pin & socket connectors without the reliability.  Pin &
socket connectors have good density and superior retention force and
resistance to contaminants.  Another good selection would be those 3- and
4-row modular connectors made by Amp.  Talk about pin density!

[5] Support Redundant Busses

Allow two busses to be operated in parallel, sharing the work load, with
a clean mechanism to switching all the traffic to one bus if the other
bus breaks.

[6] Hardware Semaphores

Provide semaphore support for multiple processors.  This should work much
like bus arbitration, with waiting processors spinning on the busy semaphore
without using any bus cycles.

Just a few ideas to get things started.  This is one of my favorite subjects,
as if you couldn't tell by the size of this article.  :-)  If you need to
reply to me, use the address below instead of the one in the header since
I'll be long gone from IBM (thank God!) before most of the net gets this.

----
Live: Phil Harbison,  soon to be at Xavax, Inc.
Mail: alvitar@uahcs1.UUCP or alvitar@xavax.UUCP

"Skin it back!" - The Unknown Blues Band

davidb@brad.inmos.co.uk (David Boreham) (11/30/89)

There is a book on the VMEbus. It is written by one Wade Peterson
and is published by VITA (I think). If I can find the flyer Wade
Gave me at Buscon, I'll post the full details.

CISy'ness of VMEbus: How about

1) Address Modifiers (remove)
2) Address only cycles. (remove)
3) D16 data width (remove)
4) A24 address size (remove)
5) Most of the arbitration options (use ROR and pri, say)
6) ROAK Interrupts
7) Interrupt Vectors (remove)

That would do me for starters.
I don't think there's any hope for Miltibus :)

David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

linimon@attctc.Dallas.TX.US (Mark Linimon) (12/01/89)

In article <3070@cello.UUCP> alvitar@weasel.austin.ibm.com (Phillip L. Harbison) writes:
>Of course the shuffle network has to be implemented somewhere, but why
>not do it on the CPU card? Most modern micros implement this on the chip
>anyway.  (at the the 68020, 68030, 88200, 386, and probably many more)

Note that at least one of the RISC designs does _not_, many other may not
as well.  I will agree, however, that this is one area where one might
reasonably expect an MPU to perform this function, given its inefficient
implementation otherwise.

Mark Linimon
Mizar, Inc.
linimon@mizarvme

disclaimer: Mizar neithers knows nor cares that I have opinions.

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (12/02/89)

In article <3070@cello.UUCP> alvitar@weasel.austin.ibm.com (Phillip L. Harbison) writes:
>[1] No justified data busses!
>[2] Get rid of daisy-chained signal lines!
>[3] Get rid of centralized resource managers.
>[1] Geographic Addressing
>[2] Try-Again Signal
>[3] Cache Consistency (or Coherency)
>[4] Pin & Socket Connectors
>[5] Support Redundant Busses
>[6] Hardware Semaphores

To which I would add:

[7] Power Mate Before Signal Mate

This allows a board to be inserted into a running system, without
(necessarily) glitching the system.  The logic gets to power up
before the signal lines are connected.

[8] Bus Isolation Mode

A freshly-inserted/reset board should be in BI mode. This means that
it "isn't there", electrically and logically. To the rest of the
system, the slot appears to be empty/jumpered. Other boards are free
to respond to addresses which the isolated board would normally have
responded to. This feature is useful for physically adding/removing
boards to/from a running system: for electronically deleting failed
boards: for electronic swapin of spare boards: and for "blocking"
during self-test.

[9] Built In Self Test ( Built In Test Equipment )

Each board should have BIST (BITE) features. The bus should be
involved: the system should be able to force a board into BIST mode.
This also puts the board's interface into BI mode, so that boards
which temporarily/permanently aren't useful, won't be called upon.
There is a question as to how a board comes back out of BIST mode,
depending on why it went in, and what the result was. Also, if the
board has failed, how much can the system find out about the failure.

[10] No Byzantine Agreement

When a system comes up, there should be one board which knows that it
is in charge of bringing up the rest (and the rest should agree).
Software ("byzantine") schemes don't cut it. The initial board should
run BIST, then start putting the other boards through their BIST. The
simplest scheme is just the board in slot zero, but perhaps something
more robust can be found that isn't too complicated.  (Maybe we can
assume that the operator put a board in slot zero, but we certainly
can't assume that that board will never fail.)
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

lamaster@athena.arc.nasa.gov (Hugh LaMaster) (12/04/89)

Are there any comments on how well FutureBus meets these improvements? 
There have been several articles describing FutureBus in IEEE Micro,
as well as the draft standard itself.  My understanding is that the
32 bit standard is now complete, and 64bit+ extensions are now being
drafted.

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

afgg6490@uxa.cso.uiuc.edu (12/04/89)

..> Distributed arbitration...

I go both ways on this. 
I have worked with several systems where bus arbitration time
was one of the principal bottlenecks. IE. full bus bandwidth
could never be acheived in a real system because (1) all those
asynch signals were being synchronized, and (2) the system
spent much of its time asking "please can I use the bus"
instead of actually using it.

Nubus distributed arbitration seems simple enough, but the wave
form of scheduling probably is not acceptable to hard real time
(fixed priority) scheduling (see Shan "a bus access should be
scheduled at the same priority as a bus access").

Futurebus+ distributed arbitration seems to further complicate things.
Potentially all of the bits in the arbitration code can flip
- at least O(N).  All are asynch signals.... because its distributed,
seems to me that you are limited by the response time of the *slowest*
arbitration logic in the system.   I shudder at the thought of trying
to assemble a system for a customer, customized with a single
board, available from only one vendor, and finding out that this crucial
board synchronizes all bus signals to a 4MHz clock. Ouch!
Other things that scare me about FUTUREBUS+ distributed arbitration
include the long and short arbitration cycles.  And another arbitration
mode is proposed as an improvement! And all of the priority manipulation
rules...

What we need is a distributed arbitration scheme that
(1) is positive - instead of waiting for a potentially long time
interval for stability of of the arbitration code, grants when
all contestants acknowledge the same winner (ok, that's already been done)

(2) is fast in special circumstances - like idle bus (without requiring
a special arbitration mode)

(3) supports stripped down priority manipulation (like, none)
in systems where it is appropriate.

afgg6490@uxa.cso.uiuc.edu (12/04/89)

>[9] Built In Self Test ( Built In Test Equipment )
>
>Each board should have BIST (BITE) features. The bus should be
>involved: the system should be able to force a board into BIST mode.
>This also puts the board's interface into BI mode, so that boards
>which temporarily/permanently aren't useful, won't be called upon.
>There is a question as to how a board comes back out of BIST mode,
>depending on why it went in, and what the result was. Also, if the
>board has failed, how much can the system find out about the failure.

Don't oblige potentially expensive features.
Encourage them. Support them.
But do not oblige boards to support a complicated BIST protocol.
I've seen boards spend a lot of logic supporting a BIST interface,
only to have no test hardware at all. The BIST interface could have
been used more productively on more basic things, like ECC.

Put another way: accept "No" responses to external BIST requests.
A simple No.
Not a No that needs to be bundled in some complicated packedt and returned.

davidb@braf.inmos.co.uk (David Boreham) (12/04/89)

I said that I'd post the full title of Wade Peterson's book on  VMEbus:


"The VMEbus Handbook" by Wade D. Peterson

Available from;

VITA
10229 N. Scottsdale Road, Suite E
Scottsdale, Arizona 85253 USA
+1 (602) 951-8866

Although I've not read this book in detail, it looks
like a good treatment of the VMEbus and is useful for
prospective VMEbus card designers.

I costs $39.95 + postage of $5.

David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

dave@dtg.nsc.com (David Hawley) (12/09/89)

> From: lamaster@athena.arc.nasa.gov (Hugh LaMaster)
> Message-ID: <5693@eos.UUCP>
>
> Are there any comments on how well FutureBus meets these improvements? 
[elimination of CISC features; good ideas that should be in new busses]
> There have been several articles describing FutureBus in IEEE Micro,
> as well as the draft standard itself.  My understanding is that the
> 32 bit standard is now complete, and 64bit+ extensions are now being
> drafted.

The new Futurebus+ development supercedes, as well as extends the old
IEEE 896.1 Futurebus standard.  Here are some comments:

> From: alvitar@weasel.austin.ibm.com (Phillip L. Harbison)
> (soon to be alvitar@uahcs1.UUCP or alvitar@xavax.UUCP)
> Message-ID: <3070@cello.UUCP>

> [1] No justified data busses!
> [2] Get rid of daisy-chained signal lines!
> [3] Get rid of centralized resource managers.

Futurebus has none of these.  32-bit transfers are standard; wider data
paths and byte lane enables are supported.

> [1] Geographic Addressing
> [2] Try-Again Signal
> [3] Cache Consistency (or Coherency)
> [4] Pin & Socket Connectors
> [5] Support Redundant Busses
> [6] Hardware Semaphores

Futurebus has explicit support for all of these except redundant busses.
Futurebus is the only standard bus I know of that can support copy-back
cache protocols without using busy-retry.  It now uses a Metral pin and
socket connector (4-row, 2mm grid, 192 pins).

> From: lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay)
> Message-ID: <7172@pt.cs.cmu.edu>
> [7] Power Mate Before Signal Mate
> [8] Bus Isolation Mode
> [9] Built In Self Test ( Built In Test Equipment )
> [10] No Byzantine Agreement

All that is required for live insertion is that the board be brought to
the same ground as the system (static discharge) before insertion, as
long as power is sequenced correctly on the board to ensure that the bus
drivers are disabled.  The live insertion facility is available in
Futurebus+, but not required; many systems do not need it.  Bus
isolation, built-in self-test, and "monarch" selection on power-up are
mentioned in Futurebus+, but an exact implementation is not specified.

Some things that Futurebus+ provides that are required in any bus that
have not been mentioned yet are:

[11] Clean Electrical Environment

Futurebus+ uses BTL (backplane transceiver logic), which is designed
to operate in a backplane's transmission line environment.  This allows
reliable incident-wave switching (no reflections) on all signals,
preventing the settling-time performance loss typical of TTL systems.

[11] High-Performance Block Data Transfers

Futurebus+ source-synchronous/packet-mode transfers allow data to be
transferred on the backplane at the physical limits of the media, which
is somewhere between 50 and 100 MHz for BTL.  Unfortunately, specialized
silicon must be developed to take advantage of this protocol.  Futurebus
also specifies a fully handshaken "compelled" transfer mode for more
traditional implementations, up to 25 mega-transfers/second.  A true
RISC bus would probably allow only one type and size of block transfer.

[12] Split Transactions

A write-only protocol has significant advantages in system-wide access
latency.  Of course, the design of all boards, especially memory boards,
gets more complicated.  Futurebus+ supports (but does not require) split
transactions - a "CISC" compromise.

Futurebus meets most of these "requirements" for a RISC bus, but it is
still CISC (and still valuable, I believe).  So what really defines a
RISC bus?  Think about your system implementation requirements, and
strip anything that does not compromise your cost and performance goals.
This will vary for I/O, memory, multiprocessing, real-time, or fault-
tolerant busses.  If you are designing for one of these, you can build a
RISC bus.  If you want to support more than a few of these, look at CISC
Futurebus.

Dave Hawley                                 National Semiconductor Corp.
dave@dtg.nsc.com                                           (408)721-6742

Disclaimer: I do not represent the IEEE or the P896.x Futurebus+ committee.

des@dtg.nsc.com (Desmond Young) (12/15/89)

One comment, there is a group of (future) Futurebus+ users in Europe
that have sent out fliers (should that be flyers?) encouraging
people to join their effort to:
 "define useable subsets of Futurebus +".
  This does hint at the very (very) CISC nature of Futurebus+. It has
every facility (almost) ever dreamt of. As an analogy to CISC
processors, it almost has a single instruction to compile a program
 :-). (Well, a wee bit of an exaggeration).
Anyway, if you want to go fast, it has too much baggage.
My Opinion etc.

johnt@opus.WV.TEK.COM (John Theus;685-2564;61-183;625-6654;hammer) (12/15/89)

In article <411@blenheim.nsc.com> des@dtg.nsc.com (Desmond Young) writes:
>
>One comment, there is a group of (future) Futurebus+ users in Europe
>that have sent out fliers (should that be flyers?) encouraging
>people to join their effort to:
> "define useable subsets of Futurebus +".
>  This does hint at the very (very) CISC nature of Futurebus+. It has
>every facility (almost) ever dreamt of. As an analogy to CISC
>processors, it almost has a single instruction to compile a program
> :-). (Well, a wee bit of an exaggeration).
>Anyway, if you want to go fast, it has too much baggage.
>My Opinion etc.

I'm not sure what flyer Des has seen, but the one I have was not produced by
the FMUG group in England, but by a couple of guys located in the northeast.
They are trying to form FIA, the Futurebus Implementors Association.  In my
opinion they are trying make a buck off the Futurebus bandwagon by setting
up this group and then charging management fees.

They held their first meeting last week at the site of the Futurebus+ meeting.
I should point out that this activity is in no way associated with the IEEE
sponsored Futurebus effort, and that the 2 principals have not been
contributors to the development of the standard.

The stated purpose of FIA is to produce a subset document of the Futurebus+
standard and then lead the development of a silicon implementation.  They
passed out a strawman proposal to show us how they would subset the documents.
Basically, their proposal went down in flames; for several reasons.

The simple ones were: hey, we're already doing all that; and technically, you
don't know what you are talking about.  In the latter case, their subset
would have broken several of the protocols.  Futurebus+ has a 4 bit transaction
command code that is transmitted with each address.  In their subset, they
picked some of the codes but not others that are required to use the first set. 
For example, they included some of the cache coherence codes, but left out
the codes for invalidate and copyback.  In addition, their subset would not
have supported any of the lock facilities.

In the first case, several commercial silicon companies are working on
chip designs that, assuming they follow the spec, will work together properly.
At the board level we realize this is a much harder problem.  The boards
for this bus will occupy a very large design space; much larger than any
previous standard bus - the CISC factor.

Futurebus+ as a whole provides a large number of facilities that no single
board will fully implement.  As examples; coherent caches, message passing,
live insertion, dual redundant buses, split transactions (write-only protocols),
etc.  At the physical layer there are issues of board size, electrical
environment, and physical environment.  The range includes the Navy putting
Futurebus+ boards in combat vehicles to HP talking about Futurebus+ boards
in a desktop PC.

To focus this range of applications into distinct groups, the 896.2 standard
(Futurebus is the IEEE 896.x family of standards) will include chapters
called profiles.  At present 2 profiles are being developed; one for general
purpose computing, the VMEbus replacement, and the other for I/O applications,
which is being led by DEC.  As we better understand the needs of these two
applications, these profiles may merge.  Clearly the Navy's profile for
their combat applications will be very different.  The profiles specify the
physical, electrical and protocol decisions that are required to ensure
interoperability among all boards built to the profile.

The question is whether RISC vs. CISC is a meaningful discussion for a bus,
especially an industry standard bus.  Probably most well-designed proprietary
buses are RISC, simply because a company usually designs a bus for a specific
application and its not cost effective to carry excess baggage.

I don't expect to ever see an industry-standard RISC bus, simply because the
market for it will be too small.  In fact I think Futurebus+ will be the last
great bus.  As more and more systems require greater than a few Gbytes/sec of
bandwidth, buses will be replaced by switch-based interconnect schemes
such as SCI, and I wouldn't call switch routing RISC.

Finally, as you might expect from my title, I have a different opinion about
the obtainable performance with Futurebus+.  Fast is of course a relative
term, so I'll just state that I expect to be building hardware next year
that can sustain more than 500 MBytes/sec. on a 64 bit wide Futurebus+.

John Theus				johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations

afgg6490@uxa.cso.uiuc.edu (12/15/89)

>Nubus, Multibus II, and Futurebus all use the same basic parallel
>contention logic for resolving multiple requests.  A "fairness"
>protocol is layered over this logic.  In order to allow real-time
>priority scheduling, a priority protocol also must be layered.
>None of this is easy; all of it takes time, bus lines, or both.

Does the "distributed arbitration" protocol not implicitly
provide priority arbitration directly? IE. cannot the arbitration code
be a fixed numerical priority?  
    I think that the problem is that when non-priority arbitration codes
get layered on top of the distributed arbitration scheme (ones in which
you effectively manipulate the level of the arbitration code) you have to
devote sufficient logic that the simplicity of fixed priority arbitration
gets lost.

>Some aspects of Futurebus+ arbitration are limited by the slowest logic
>in the system, some by the slowest in a group of competitors, and some
>only by the speed of the winning board.  This makes the implementation
>of the protocol critical.  Synchronizing to any clock is suicide.  The
>protocol is complex, but the committee had a number of historical,
>political, and schedule constraints, as well as functional ones (eg,
>real-time priority scheduling).  Bus interface silicon should eventually
>hide some of this complexity, as well as improve speed.

(1) Could you list, for the benefit of readers, which actions are
limited by what? (Else I'll have to dig through my spec).

(2) It would be interesting to see what bus interface implementations for
existing busses, both discrete and integrated, are truly asynchronous
versus synchronous, synchronizing the asynch bus signals to an internal
clock.
    Saying "bus interface silicon should hide complexity" is a bit of a
cop out - yeah, sure, but look how long it took for decent VME bus interface
chips to come out (particu;larly the VITA chip, for people who didn't
want to buy Motorola's VME chip).  I do not know of any VME chip 
interface that is truly asynchronous.  It sure would be nice if
the spec made real asynch implementations easier.

(Of course, there is some hope that the resurgence of asynch techniques,
a la Sutherland, may make asynch implementations easier for the average
Joe designer of the bus interface logic.  That's been the real bottleneck
- not that asynch is terribly hard, just that it's less well known).

Note that I am not against FUTUREBUS and FUTUREBUS+ -- the fact that
they are becoming standards is great.
    What I am asking about is what a RISC bus would look like,
faster than FUTUREBUS+, or less complex.

afgg6490@uxa.cso.uiuc.edu (12/15/89)

OK, let me make some suggestions for a RISC bus:

(1) All transactions are disconnected or split.
    Possibly an arbitration preemption line if the response is
    immediately available. (IE. you don't assume connected
    and then change over to split depending on ACK. You assume split.
    Connected = split with immediate response separate).

(2) Throw out all the fancy synchronization operations.
    Provide (i) a LOCK signal that can be applied only to a single
    resource of less than bus width. Let software protocols handle
    multiple resource locking - don't require the bus interfaces
    to track it.
      If you feel adventurous, provide (ii) a remote load-store-fixed
    or compare-and-swap, or (ii) a remote fetch-and-add.  These
    because they possibly permit combining.
      Probably only provide one of them.


Now I'll go out on a limb.

(N->infinity)  Forget about arbitration fairness.  Software can implement
    fairness at the process level (eg. by counting blocked bus cycles and 
    scheduling processes to even them out).

afgg6490@uxa.cso.uiuc.edu (12/15/89)

>One comment, there is a group of (future) Futurebus+ users in Europe
>that have sent out fliers (should that be flyers?) encouraging
>people to join their effort to:
> "define useable subsets of Futurebus +".

Can you give me any pointers to these folks? I'd like to contact them.

afgg6490@uxa.cso.uiuc.edu (12/15/89)

Another RISC bus suggestion:

-> Don't have combined A/D lines.

Although it is very attractive to take your 32 address lines
and your 32 data lines and combine them for a 64 bit wide data path,
it is a lot sillier when you have 256 lines in total (an extra 32
address lines gets lost in the shuffle).

There are a lot of address only data transactions, in a cache coherent system.

Conversely, the block data transfers that we are trying to optimize
with 256 bit transfers would tend to use the data bus for a long time.
(Is this true? I/O transfers use blocks >> 256 bits, but do (should) cache 
systems...)
So address only transactions for processors get blocked by block transfers
for I/O.  Let em pass.

jjg@walden.UUCP (John Grana) (12/17/89)

In article <112400018@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
>
>Another RISC bus suggestion:
>
>
>Although it is very attractive to take your 32 address lines
>and your 32 data lines and combine them for a 64 bit wide data path,
>it is a lot sillier when you have 256 lines in total (an extra 32
>address lines gets lost in the shuffle).
>
Speaking of combining the address and data lines for a 64 bit data
path, the latest VMEbus specification (rev. D?) will define a new
type of block transfer mode - BLT64 or VME64 (I'm not sure what they
plan on calling it). It is like the present Block Mode (address/data
cycle then data only) except that:

	1) The first cycle is an address only cycle.
	2) All cycles after that are 64 bits (both the address and
	   data lines transfer the data).
	3) 1 or 2 new timing parameters have been added (I don't recall
	   what they are...)

John Peters from Performance Technologies Inc in Rochester NY came up with
the initial timing and byte lane ordering. He also designed a "proof of
concept" board set and is running > 60 Mbytes/sec on various VMEbus
backplanes.


John Grana
jjg@walden.UUCP

johnt@opus.WV.TEK.COM (John Theus;685-2564;61-183;625-6654;hammer) (12/19/89)

In article <112400016@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
>
>OK, let me make some suggestions for a RISC bus:
>
>(1) All transactions are disconnected or split.
>    Possibly an arbitration preemption line if the response is
>    immediately available. (IE. you don't assume connected
>    and then change over to split depending on ACK. You assume split.
>    Connected = split with immediate response separate).
>
>(2) Throw out all the fancy synchronization operations.
>    Provide (i) a LOCK signal that can be applied only to a single
>    resource of less than bus width. Let software protocols handle
>    multiple resource locking - don't require the bus interfaces
>    to track it.
>      If you feel adventurous, provide (ii) a remote load-store-fixed
>    or compare-and-swap, or (ii) a remote fetch-and-add.  These
>    because they possibly permit combining.
>      Probably only provide one of them.

I think this is a good example why designing a RISC bus is difficult.  If
you did item (1) and item (2)(i) you would have a flawed lock operation.
A split transaction interconnect requires at least one of item (2)(ii) to have
a true atomic operation.  Which one?  That's why both Futurebus+ and SCI
(Scalable Coherent Interface) have these lock operations.

Before I launch into a long lock discussion, I want to point out that making
the basic decision to use split transactions adds several times to the
complexity of the bus interface logic over a connected protocol.  So even if
the bus protocols are RISC, the interface implementation is very complex.

For those of you that haven't thought about this first problem, let me try to
explain locks in a split transaction environment.  A split transaction
consists of a request transaction e.g. a processor requests a read from memory
(the requester), follow eventually by a response transaction from memory
that returns the requested data (the responder).  Only writes are performed
on the bus since memory becomes a bus master as the responder.  Several
other transactions can occur on the bus between a request and its response.

For the typical processor generated semaphore of a read followed
by a write (swap, test-and-set, etc.), the processor might simply make a
read request followed by a write request with its accompanying data.  The
responder would return the requested read data as the first response, and a
write acknowledge as the second response.  Besides being very inefficient,
there is no guarantee the responder will not receive a request from another
party between the two semaphore requests.  If this occurred, the semaphore
would not be atomic.  The lock protocols might require the "bus" to prevent
another request from being issued, or they might prevent the responder from
acting on another request, but this would largely defeat the purpose of
using split transactions, especially in a switch environment.

Split response transactions require a different technique to lock the read
and write operations.  The solution is to perform the lock operation with
a single request.  Accompanying this request is a command for the responder
to execute.

For example, to perform a swap operation, the requester becomes master and
addresses the responder with a swap lock transaction command and sends the
data to be written.  Then the master disconnects.

When the responder acts upon the request, it executes the command by
first reading the data addressed and stores it in a temporary buffer.
The responder then writes to memory the data that was sent along with
the request.  The responder atomically executes the read and write memory
operations.  The buffered data is sent back to the requester in
the form of a response transaction.

The fetch and add command is executed by the requester sending the value to add
in the request.  The responder returns the original unmodified value to the
requester, and then stores the sum in the addressed location.

The compare and swap is executed by the requester sending the compare
value and the swap value in the request.  The responder returns the original
unmodified value to the requester, and if the compare value is equal to
the original value, it then stores the swap value in the addressed location.

So, the next problem for the bus designer is deciding which lock operations
to support.  Most processors can generate a swap of some form, but the load
occurs first followed by the store which is great for a connected bus, but
backwards for a split environment.

A lot of system designers would like a fetch-and-add since it is a more
powerful operation than swap, but I far as I know, only Intel has produced a
mainstream processor with this instruction.  Right now I know of no processor
that can directly generate a lock operation for a split transaction
interconnect.  Fetch-and-add allows combining in switch environments and
allows the return of many (one per bit of data width) unique values in a
single transaction on a bus with broadcast.

Futurebus+ provides 3 bits for lock command encoding.  Four codes are
reserved for the future, and the others are nop, swap, compare-and-swap, and
fetch-and-add.

SCI provides 4 bits for lock command encoding.  Eight codes are
reserved for the future, and the 3 bits that make up the other 8 are used
to directly control the hardware facilities that would be required if you
implemented all of the above Futurebus+ commands.  With this approach you
can generate all the lock permutations that the hardware can could support.
In my opinion this goes beyond CISC since most of the lock operations SCI
implements no one knows how to use.

On your split transaction RISC bus, how would you do locks?

John Theus                                johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations