[comp.arch] Futurebus+ @ 500MBytes/sec

mark@mips.COM (Mark G. Johnson) (12/19/89)

In article <276@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes:
  >
  >As I stated in a previous posting, we expect to be building Futurebus+
  >hardware in the coming year that can sustain 500 Mbytes/sec on a 64 bit
  >wide data path.  The questions that immediately come to mind are either
  >this does meet your definition for "fast", or you don't believe we can
  >deliver this bandwidth.

Could you explain what is meant by "sustain" above?  At least two
really cool things might be implied by this:

  (1)  In the coming year you'll be building Futurebus+ hardware that
       can transfer 512 bytes (i.e. 128 words, or 64 bus-widths of data)
       in 1024 nanoseconds.  This'd be a cache refill from main memory.

  (2)  In the coming year you'll be building Futurebus+ hardware that
       can do a DMA transfer of 50 Megabytes in 0.1 second.

Are either of the assertions above correct?  {they're based on the assumption
that (500 MB/sec  /  8 bytes/transfer) = 62.5 Mtransfers/sec = 16ns/transfer
is the bus cycle time in "sustained" operation}

Also, folks might think it was a bit smelly to claim high throughput rates
for a "bus" that only has two or three slots, or for a "bus" that doesn't
involve a printed circuit backplane board having connectors or sockets for
separate daughterboards.  Just to lay these types of silly "oh-yeah?"
questions to rest permanently,

   Does next-years-500MB/sec-Futurebus+ have a mother/daughterboard
   construction with N>6 connectors and card slots, and will it run
   at 500 MBytes/sec when fully populated?

Thanks.
-- 
 -- Mark Johnson	
 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
	(408) 991-0208    mark@mips.com  {or ...!decwrl!mips!mark}

johnt@opus.WV.TEK.COM (John Theus;685-2564;61-183;625-6654;hammer) (12/20/89)

In article <33845@mips.mips.COM> mark@mips.COM (Mark G. Johnson) writes:
>
>Could you explain what is meant by "sustain" above?  At least two
>really cool things might be implied by this:
>
>  (1)  In the coming year you'll be building Futurebus+ hardware that
>       can transfer 512 bytes (i.e. 128 words, or 64 bus-widths of data)
>       in 1024 nanoseconds.  This'd be a cache refill from main memory.
>
>  (2)  In the coming year you'll be building Futurebus+ hardware that
>       can do a DMA transfer of 50 Megabytes in 0.1 second.
>
>Are either of the assertions above correct?  {they're based on the assumption
>that (500 MB/sec  /  8 bytes/transfer) = 62.5 Mtransfers/sec = 16ns/transfer
>is the bus cycle time in "sustained" operation}
>

By sustained, I mean that over a period of 0.1 second, the combined traffic
of cache lines (64 bytes on Futurebus+) and large DMA blocks will move more
than 50 Megabytes.  The burst rate, the transfer rate within a single
transaction, will be slightly higher.

>
>Also, folks might think it was a bit smelly to claim high throughput rates
>for a "bus" that only has two or three slots, or for a "bus" that doesn't
>involve a printed circuit backplane board having connectors or sockets for
>separate daughterboards.  Just to lay these types of silly "oh-yeah?"
>questions to rest permanently,
>
>   Does next-years-500MB/sec-Futurebus+ have a mother/daughterboard
>   construction with N>6 connectors and card slots, and will it run
>   at 500 MBytes/sec when fully populated?
>
>Thanks.
>-- 
> -- Mark Johnson	
> 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
>	(408) 991-0208    mark@mips.com  {or ...!decwrl!mips!mark}

Yes, I'm talking about a more-or-less standard Futurebus+ backplane
environment with more than 6 populated slots.  Futurebus+ used BTL (Backplane
Transceiver Logic), made by NSC, TI and Signetics.  BTL on a daughtercard
drives a 50 to 60 ohm stripline backplane that is terminated in 39 ohms to
2 volts at both ends.  As long as the edge speed stays longer than 1 nsec.,
this electrical environment is good for data periods of 10 nsec. or less
on a 19 inch rack length backplane with 1 inch board spacing.

The high speed data transfer protocol Futurebus+ uses is called packet mode
and it was invented by Emil Hahn of Signetics.  This protocol uses source
synchronous transmission without transmitting any clock.  Since there is
no clock, there are no bus level set-up or hold times.  The protocol is
also not limited by signal skew, which turns out to be the biggest source
of delay in more standard protocols.  The bottom line is this protocol will
not be a limiting factor in ultimate performance.

The Futurebus+ spec requires a packet implementor to support a minimum
packet speed of 60 MTransfers/sec or 480 MBytes/sec on a 64 bit bus.

As I've tried to show, the electrical environment and the protocol will
both support better than the 16 nsec/transfer rate that Mark asked about.  The
limitation on our performance this next year will be the silicon implementation.

John Theus                                johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations

yodaiken@freal.cs.umass.edu (victor yodaiken) (12/21/89)

In article <278@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes:
>The high speed data transfer protocol Futurebus+ uses is called packet mode
>and it was invented by Emil Hahn of Signetics.  This protocol uses source
>synchronous transmission without transmitting any clock.  Since there is
>no clock, there are no bus level set-up or hold times.  The protocol is
>also not limited by signal skew, which turns out to be the biggest source
>of delay in more standard protocols.  The bottom line is this protocol will
>not be a limiting factor in ultimate performance.

How exactly  does this work? References?

johnt@opus.WV.TEK.COM (John Theus;685-2564;61-183;625-6654;hammer) (01/04/90)

In article <7863@dime.cs.umass.edu> yodaiken@freal.cs.umass.edu (victor yodaiken) writes:
>In article <278@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes:
>>The high speed data transfer protocol Futurebus+ uses is called packet mode
>>and it was invented by Emil Hahn of Signetics.  This protocol uses source
>>synchronous transmission without transmitting any clock. ...
>
>How exactly  does this work? References?

The only references are the Futurebus+ spec itself, and the published
working group meeting minutes where Emil presented the papers on his
protocol.

The packet data transport protocol was designed to move data as fast as
possible with a minimum feature set.  The protocol does not allow sub-word
operations, only 32, 64, 128 or 256 bit wide words can be transferred.  No
lock operations can be done when using this protocol.  Blocks are transferred
of length 2, 4, 8, 16, 32 or 64 words long.  The block length is signalled
at the start of the transfer.

The transfer protocol is very similar to the asynchronous protocol used on
RS-232.  If we just think about an individual bit for now, the sender
transmits its data at the frequency of an on-board clock.  As with RS-232,
the frequency must be known by both the transmitter and the receiver in
advance.  The Futurebus+ protocols provide a mechanism for selecting one
of two such frequencies on a transaction by transaction basis.

To start data transmission, the sender transmits a sync bit which is a
logic one.  The data is encoded using NRZI, where a logic one is represented
by an edge transition during a datum cell, and a logic zero is represented by
no transition.  Therefore to start a packet, an edge is sent followed by the
encoded data, and concluded by an even longitudinal parity bit.  When
parity is correct, the signal line is left in the logic zero state.

The receiver has its own on-board clock that runs at the same frequency as
the sender.  Both sender and receiver must have clock frequency tolerances
of 0.01% or better.  When the receiver sees the sync bit at the start of a
packet, its logic sets a precision delay equal to the phase difference
between the sync bit and its on-board clock.  Thereafter, the logic uses
the on-board clock plus the delay to define the datum cell positions for
sampling the rest of the data.  The maximum packet length is limited by
the drift that occurs between the 2 clock sources.

Now multiply the sending and receiving circuitry by the number of bits in
a parallel word.  Note that there is only 1 on-board clock source, but N
(where N equals the number of bits/word) independently settable delays in the
receiver.  After the individual bits are captured in the receiver, additional
stages of logic are used to synchronize the bits into a parallel word.

Clearly this is not a protocol to implement in discrete logic, and silicon
companies are hard at work building the parts necessary to run this
protocol.  The Futurebus+ spec requires a minimum clock frequency of 60
MHz, which translates to 60 Mtransfers/sec.  We expect the first silicon
to do better than this.

The bandwidth utilization efficiency of this protocol varies greatly based on
the packet length, from 50% for a 2 word packet to 97% for a 64 word packet.
It is possible to sustain the 97% efficiency over transfers that are much
longer than 64 words by using multiple packet mode.

This protocol allows packets to be chained together back-to-back with no
lost clocks; as long as a single source is transmitting all the packets.
While a packet is being transmitted, the command, status and compelled
handshake signals are used to request new packets and acknowledge new
packets, including their cache attributes.  The requesting process can
occur asynchronously with respect to the packet currently being
transmitted and also out of phase.  By this I mean requests can be either
in lock step with their packet transfer, or 1 or more packets ahead.

Cache coherence is maintained during multiple packet mode and intervention
is also supported.  During a single transaction there can be multiple
packet sources due to intervention.  When a packet source change is made,
at least 1 clock is lost in the change-over.  A good example of multiple
packet sources during a single transaction would be flushing a dirty page
back to a disk subsystem that has dirty lines in several different caches.
This protocol allows a single transaction to remove the page from memory and
the caches, and invalidate the caches.

John Theus                                johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations

filbo@gorn.santa-cruz.ca.us (Bela Lubkin) (01/08/90)

In article <280@leia.WV.TEK.COM> John Theus writes:
>The transfer protocol is very similar to the asynchronous protocol used on
>RS-232.  If we just think about an individual bit for now, the sender
>transmits its data at the frequency of an on-board clock.  As with RS-232,
>the frequency must be known by both the transmitter and the receiver in
>advance.  The Futurebus+ protocols provide a mechanism for selecting one
>of two such frequencies on a transaction by transaction basis.
> [...]
>The receiver has its own on-board clock that runs at the same frequency as
>the sender.  Both sender and receiver must have clock frequency tolerances
>of 0.01% or better.  When the receiver sees the sync bit at the start of a
>packet, its logic sets a precision delay equal to the phase difference
>between the sync bit and its on-board clock.  Thereafter, the logic uses
>the on-board clock plus the delay to define the datum cell positions for
>sampling the rest of the data.  The maximum packet length is limited by
>the drift that occurs between the 2 clock sources.

Why isn't one more line used to transmit the sender's idea of the data
clock?

  clock  +++---+++---+++---+++  "111111"
  data0  +++++++++---+++------  "001110"
  data1  +++------+++------+++  "101101"
    :
  dataN  ++++++------------+++  "010001"

The receiver could still choose to ignore the clock line and use the
above method.  It could also use the above method, but dynamically
adjust the delay, tracking the guaranteed transition of the clock line
at the start of each cell.  This would eliminate the requirement for
closely matched clock frequencies and would seem to provide much better
reliability.

I'm not a bus designer; not even a hardware person.  Maybe I'm missing
something really obvious.  If so, how about explaining it instead of
flaming me to toast?  ;-}

Bela Lubkin    * *    //  filbo@gorn.santa-cruz.ca.us  CI$: 73047,1112 (slow)
     @       * *     //  belal@sco.com  ..ucbvax!ucscc!{gorn!filbo,sco!belal}
R Pentomino    *   \X/  Filbo @ Pyrzqxgl +408-476-4633 and XBBS +408-476-4945

johnt@opus.WV.TEK.COM (John Theus) (01/11/90)

In article <136.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes:
>In article <280@leia.WV.TEK.COM> John Theus writes:
>>The receiver has its own on-board clock that runs at the same frequency as
>>the sender.  Both sender and receiver must have clock frequency tolerances
>>of 0.01% or better.  When the receiver sees the sync bit at the start of a
>>packet, its logic sets a precision delay equal to the phase difference
>>between the sync bit and its on-board clock.  Thereafter, the logic uses
>>the on-board clock plus the delay to define the datum cell positions for
>>sampling the rest of the data.  The maximum packet length is limited by
>>the drift that occurs between the 2 clock sources.
>
>Why isn't one more line used to transmit the sender's idea of the data
>clock?
>[...]
>

There are at least 2 major reasons way we don't ship a clock signal with
the data.  One is a fundamental performance limiter, while the other is
related to the data encoding scheme we use.  However, we didn't get to
where we are today overnight, and in fact a little over a year ago we started
out with a separate clock signal when I wrote the first non-compelled
protocol proposal.

What we've learned from evaluating transfer protocols is that the fundamental
performance limiter is caused by signal skew (assuming a clean electrical
environment).  Skew is the difference in time between the arrival of two
signals from a common source.  The major sources of skew are variations in
the propagation delay through logic and though the physical environment.

In the Futurebus+ environment, it takes several bus transceiver chips to
make a 32 bit wide data path.  The limiting factor here is power
dissipation.  9 bits is near the limit for present BTL transceivers with
normal commercial cooling practices.  The skew through these chips is their
spec'd maximum propagation delay minus their minimum propagation delay.
The best BTL transceivers available today have a skew of 5 nsec.  So just
accounting for getting on and off the bus introduces 10 nsec of skew,
which is all lost time.  In addition, the bus itself introduces skew due
mainly to differences in capacitive loading on each line.  After including
the skews from all the other parts in the logic path, you're left with pretty
poor performance.  Also notice that there is no difference here based on
signal type.  The skews exists for both clock to data and data to data.

We identified 2 classes of skew elimination techniques, which I'll call
chip localized and bit independent.  The chip localized technique takes
advantage of the fact that you can hold skews to a much smaller value on
a single chip than across multiple chips.  A proposal was made to have
a clock signal per transceiver (8 bits + parity + clock), which localizes
the skew to what can it done on a single chip.  Numbers in the range of
1 nsec. of skew were believed possible.

This technique was eventually discarded primarily due to its physical
overhead.  Although the silicon was very simple for this technique, the
cost in power, pins and real estate was judged too high. We agreed that
complex silicon was better than a more complex physical environment.
Farther down the list was that this technique did not account for bus
skew.

The bit independent techniques evolved a little more slowly.  The first
idea was to use an embedded clock such as one of the run length limited
encodings.  This idea didn't last long when people started thinking about
building a phase locked loop per bit at several times the bit frequency.
Eventually, Emil Hahn of Signetics realized that you don't need a clock in
any form on the bus and he proposed the scheme that's in the Futurebus+ spec
and which I talked about in an earlier posting.

The other point I want to make about transmitting the clock concerns the
required bandwidth and signal fidelity.  When I previously talked about
our minimum required clock rate of 60 MHz, that's the rate at which data
is clocked onto the bus.  The bandwidth of the data itself of one-half
this frequency.  I also previously stated that the limit for our packet
protocol is the electrical environment, and somewhere below 10 nsec per
word things start to fall apart.  Putting these 2 bits of information
together says you don't ship a single edge clock with the data or you have
to half your data bandwidth due to the electrical limitations.

As your example showed, you can use a two edge clock, which we do for our
slower compelled protocol.  However, at high speeds the variation in a
signals propagation delay between its zero and one levels becomes very
significant.  This skew within the clock signal, or more precisely its
duty cycle precision becomes a limiting factor.  The precision required by
the Futurebus+ packet protocol prevents the use of a 2 edge clock.  There
are several approaches to solving this including differential and 2 half
frequency 180 degrees out of phase clocks, but each has its own set of
problems.

One final point, a 0.01% clock oscillator is a industry standard
tolerance, and its not a big deal.

John Theus                                johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations

pauls@apple.com (Paul Sweazey) (01/13/90)

FUTUREBUS EMBEDDED CLOCK: THE REAL SCOOP FROM A FALLEN ANGEL

We each have different views of history, and I have tried to stay out of 
these discussions, but the Futurebus discussion has led to issues that I 
used to live and breathe for a living.

In article <285@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes:
> However, we didn't get to
> where we are today overnight, and in fact a little over a year ago we 
started
> out with a separate clock signal when I wrote the first non-compelled
> protocol proposal.

The parallel protocol spec that I wrote, which was based directly on your 
first non-compelled proposal, is dated 7 July 88.

> A proposal was made to have
> a clock signal per transceiver (8 bits + parity + clock), which localizes
> the skew to what can it done on a single chip. 

I believe that this was first seriously and publicly proposed by RV 
Balakrishnan and Dave Hawley during the summer of 1988.

> The bit independent techniques evolved a little more slowly.  The first
> idea was to use an embedded clock such as one of the run length limited
> encodings.  This idea didn't last long when people started thinking about
> building a phase locked loop per bit at several times the bit frequency.
> Eventually, Emil Hahn of Signetics realized that you don't need a clock 
in
> any form on the bus and he proposed the scheme that's in the Futurebus+ 
spec
> and which I talked about in an earlier posting.

RV Balakrishnan suggested embedded-clock synchronization as the ultimate 
solution to skew in February 1988.

I devised and proposed embedded-clock synchronization to the SuperBus Study Group (now SCI) in March 1988, privately to Futurebus Committee members in
May 1988, and at various times in Futurebus public forums through December 1988.

Emil Hahn devised a feasible implementation of embedded-clock 
syncrhonization between November 1988 and January 1989.

A HISTORY/PolySci LESSON:
In the fall of 1987 the Futurebus (IEEE896.1-1987) was just being 
finished.  I was serving as Coordinator of the Futurebus Cache Coherence 
Task Group.  There was little active interest in speeding it up, but I 
could see that the real-world performance would not match the idealized 
theory or the marketing hype, so I started another IEEE project called 
the SuperBus Study Group.

In February 1988, before SuperBus had become SCI and when it was still 
assumed to be a bus, I proposed the use of a synchronizer (clock) per 
transceiver to eliminate interdevice skew.  RV Balakrishnan of National 
Semiconductor (Balu, the inventor of BTL logic) was in attendence, and he 
said (half in jest) that the only way to do better would be to encode a 
clock in every bit.  Until that day, this alternative had only been 
mentioned, along with optical fibers and radiation baths, as an 
unrealistic solution for a parallel bus.  Since the stated bandwidth goal 
of SuperBus was 1 gigabyte per second, I began to pursue embedded clocking 
seriously.

(SuperBus is now IEEE P1596 Scalable Coherent Interconnect (SCI), chaired by 
Dave Gustavson of SLAC and co-chaired by Dave James of Apple.  It is now a 
point-to-point interconnect of arbitrary topology, and it REALLY WILL 
reach 1 gigabyte per second.)

On April 22 I published a memo inside National Semiconductor (I worked 
there at the time.) which I copied to some Futurebus committee members 
including the Futurebus committee chairman (also then a National 
employee).  In it I described the theory, benefits, and implementation of 
embedded clock data transmission in an enhanced Futurebus.  One week later 
I published an expanded report on the subject, entitled "NSC 
Multiprocessing Performance Roadmap".  The report described stages of 
enhancements to Futurebus that would allow the real-world performance to 
achieve the marketing hype.  In it I estimated that burst rates of 250 to 
300 megabytes per second (32 bits wide) would be achievable with the first 
generation of embedded clock silicon.  While the proposal was accepted as 
credible and viable within the NSC technical community, it was determined 
by the Futurebus Committee contingent at National to be heretical--"a 
threat to all that we have worked for"--because it implied that 
Futurebus-1987 could not reach those speeds without further enhancement 
(which, of course, was quite true).

My proposal for embedded-clock transcievers involved the use of precision 
delay elements and quadrature sampling of each bit stream, which did not 
require PLL locking to the bit streams.

By the Fall of 1988 I no longer held any committee office, and I was no 
longer directly involved in Futurebus product planning at work, leaving me 
free to concentrate on technical issues without regard to politics.  I 
discussed technology freely, including embedded-clock data transfer with 
many, including Emil Hahn of Signetics.  Meanwhile the US Navy began a 
process of adopting Futurebus, pushing the need for it to become real 
SOON, and for it to deliver all of its promises.

In the December 1988 Futurebus meeting in San Diego, I gave a presentation 
offering two proposals:  either (1) backward-compatible enhancements to 
Futurebus-1987 as Theus had proposed, or (2) more aggressive enhancements 
using either clock-per-chip or embedded-clock techniques.  Because of new 
industry pressure that the Navy created, any changes had to be finalized 
within 8 weeks, so alternative (1) was chosen.  Nevertheless, Hahn of 
Signetics and Balu of National agreed in that meeting to analyze both 
techniques and report back at a later meeting.  At the Santa Clara meeting 
in January 1989 they came back with two different answers, and Signetics 
won, based on a similar but different (than my) data recovery method that 
Emil was confident he could implement.  Signetics won.  I was not involved 
in the decision making or analysis process; Two weeks after the San Diego 
meeting I went to work for Apple Computer.

Emil's solution involves the use of dynamically settable delay elements, 
also uses no PLL locking to the bit streams, and may need as little as 1/4 
of the FIFO storage of my proposed implementation.

So why bring this all up now?  I didn't get a patent for my 
embedded-clocking contributions, or a bonus check, or stock options, or a 
raise.  So I'll settle for glory. Embedded clocking is debatably the 
breakthrough performance feature of the "last great backplane bus", and I 
would hope that the gang remembers that I helped get it started.

To those of you with radical breakthrough ideas: be persistent but be very 
patient.  To the receivers of those ideas:  File, don't trash.  There are 
gems among the gravel.

Greeting to Theus, Balu, Hahn, Hawley, Gustavson, James, and the rest.  
They are the best in the bus business!

Paul Sweazey
Apple Computer, Inc.
pauls@apple.com
(408)-974-0253

johnt@opus.WV.TEK.COM (John Theus) (01/16/90)

In article <6149@internal.Apple.COM> pauls@apple.com (Paul Sweazey) writes:
>FUTUREBUS EMBEDDED CLOCK: THE REAL SCOOP FROM A FALLEN ANGEL
>
>We each have different views of history, and I have tried to stay out of 
>these discussions, but the Futurebus discussion has led to issues that I 
>used to live and breathe for a living.
>
> [...]
>RV Balakrishnan suggested embedded-clock synchronization as the ultimate 
>solution to skew in February 1988.
>
>I devised and proposed embedded-clock synchronization to the SuperBus Study
>Group (now SCI) in March 1988, privately to Futurebus Committee members in
>May 1988, and at various times in Futurebus public forums through December
>1988.
>
>Emil Hahn devised a feasible implementation of embedded-clock 
>syncrhonization between November 1988 and January 1989.
>
> [...]
>
This article along with a follow-up phone call to Paul cleared up some
confusion I've had about who did what and when.  Unfortunately, a lot of
the events that Paul related were never published in the Futurebus
minutes.

Part of my confusion comes from the term "embedded-clock" and I want to
make sure this doesn't mislead anyone else.  The Futurebus+ packet mode
protocol does not actually use an embedded-clock protocol.  At one time Paul
said he called it "implied embedded-clock", which I think would have been more
accurate.  Paul is very good at inventing names and techniques to describe
new concepts.  Another name he had for this protocol that caught my ear
was "packet beaming".

Typically, an embedded-clock protocol had the originating clock encoded
into the data stream.  A receiver is then capable of extracting the
clock from data following transmission.  Good examples are the encoding
schemes used for disk drives such as MFM.

The Futurebus+ packet protocol does not encode the clock into the data
stream, but instead uses a starting sync bit to synchronize the receiver
with the sender.  Both the sender and receiver have previously agreed upon
the transmission frequency.

John Theus                                johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations

rpw3@rigden.wpd.sgi.com (Robert P. Warnock) (01/16/90)

In article <286@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes:
+---------------
| Part of my confusion comes from the term "embedded-clock" and I want to
| make sure this doesn't mislead anyone else.  The Futurebus+ packet mode
| protocol does not actually use an embedded-clock protocol...
| The Futurebus+ packet protocol does not encode the clock into the data
| stream, but instead uses a starting sync bit to synchronize the receiver
| with the sender.  Both the sender and receiver have previously agreed upon
| the transmission frequency.
+---------------

Then shouldn't this be called "embedded phase"???   ;-}  ;-}

For those with a bit of deja vu about now: Yes, RS-232 async works this
way, only slower...

-Rob

p.s. Or for another way to look at it, think of each bus line as having
a >60 megabaud "UART" on it, with *big* "bytes". Note that the .01% clock
spec means that practically you're limited to about 2000 bits per start bit
(assuming 20% skew is acceptable, which it probably is if you have at least
5 clock phases to choose from -- that gives you a total of 40% skew), or
about 16K bytes per burst on the bus (64-bit-wide bus). That's probably
enough.  ;-}

-----
Rob Warnock, MS-9U/510		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311