[net.arch] Glitch Phenomenon

hugh@hcrvx1.UUCP (Hugh Redelmeier) (10/03/85)

About 10 years ago, I first learned of what was called the Glitch
Phenomenon.  It was described (if I remember correctly) in a tech report
from MIT.  They showed theoretically that asynchronous systems could not
be synchronized in *any* bounded amount of time!  They then showed some
practical examples with real TTL and oscilloscope traces.  If I remember
correctly, it is possible to build a circuit that syncronizes, and says
when it has done so (after an unbounded but short amount of time).  I
also seem to remember that a project attempting to build a machine that
was internally very asynchronous ended up having to invent equivalents
for TTL MSI so they wouldn't get bit by the glitch (perhaps Al Davis's
Data-Driven machine).

In article <1192@vax1.fluke.UUCP> witters@fluke.UUCP (John Witters) writes:
>I'd suggest reading the August 1st 1985 issue of Computer Design before you
>rush off and do this.  The article of interest is titled "Metastability haunts
>VME bus and Multibus II system designers" on page 29. ...
>Because the
>arbiter makes its arbitration decisions in about 20ns, the output of its
>synchronizer has only 20 ns to settle to a stable state, but needs at least 50
>ns to ensure reliable operation.  

Theoretically, any finite bound is not good enough.  Perhaps the
probability of metastbility extending past 50ns should be calculated
*and stated*.  Of course, maybe the journal article did (I don't have
access to it), but even the net article should qualify these bald
numbers. The danger at 50ns might well be acceptably unlikely (the
probability exponentially decreases with time) but it depends very much
on the circuit technology and design -- not too nice for an interface
standard.  As a software-type, I like things to be right or wrong,
but I understand engineers live in another universe (perhaps the real
one).

Hugh Redelmeier (416) 922-1937
{utzoo, ihnp4, decvax}!hcr!hugh

kds@intelca.UUCP (Ken Shoemaker) (10/07/85)

> About 10 years ago, I first learned of what was called the Glitch
> Phenomenon.  It was described (if I remember correctly) in a tech report
> from MIT.  They showed theoretically that asynchronous systems could not
> be synchronized in *any* bounded amount of time!  They then showed some

Actually, there is a parameter that can be measured for your everyday
flip flop to indicate mean time between synchronization failures.
Note that by their very nature synchronization circuits WILL fail
every so often, but usually the average time between synchronizer
failures is over 20 years, which is probably a bit less likely than
most of the other circuits in the computer.

A typical synchronization circuit, in TTL, looks like two back-to-back
flip flops, again, typically clocked by the same clock.  The idea
is that you won't always meet the setup/hold times of the first FF,
but you will of the second.  This works, since most FFs when their
setup/hold times are not met ultimately decide to drive their outputs
either high or low.  The time it takes this to happen when setup/hold times
are not met is related to the speed of the FF circuit, and is in direct 
proportion to the size of the real sampling window of the device, i.e., the 
size of the area where the output starts to switch, and the specified
output delays are no longer met.  Note that the closer you get to the 
center of the sampling window, the worse that the output delays
become.  Anyway, by measuring the size of the sampling window and
the delta between the time the first FF is clocked and the time the
second FF is clocked, you can calculate the mean time between synchronization
failures (obviously, the number of synchronizations you do per second
also must be taken into consideration).  Intuitively, this makes sense,
since what you are trying to do is to minimize the percentage of time
that a transition can take place that will cause the first FF not to come
to a stable output voltage before the setup time of the second FF
has started.

Any attempt to synchronize an asynchronous signal runs into
the same problem.  If there is a special problem with the VME bus,
it could be related to poor design of the bus controllers' internal
synchronizers, or an especially high-speed clock which does not allow
the first half of the synchronizer time to resolve the sygnal.
As external devices, 74AS74s or 74F74s seem to be the devices of choice
to perform this function.  Fairchild even put out a paper showing how
much better the 74F74 is at resolving asynchronous signals over a
74S74 a few years back.
-- 
...and I'm sure it wouldn't interest anybody outside of a small circle
of friends...

Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm

{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of the
	employer of its submitter.

freed@aum.UUCP (Erik Freed) (10/07/85)

> Theoretically, any finite bound is not good enough.  Perhaps the
> probability of metastbility extending past 50ns should be calculated
> *and stated*.  Of course, maybe the journal article did (I don't have
> access to it), but even the net article should qualify these bald
> numbers. The danger at 50ns might well be acceptably unlikely (the
> probability exponentially decreases with time) but it depends very much
> on the circuit technology and design -- not too nice for an interface
> standard.  As a software-type, I like things to be right or wrong,
> but I understand engineers live in another universe (perhaps the real
> one).
> 
> Hugh Redelmeier (416) 922-1937
> {utzoo, ihnp4, decvax}!hcr!hugh

This metastability problem is being grossly overdone. Fairchild when they
first brought out the F series TTL, did a little tutorial booklet on
this and included equations to figure out MTBF for different sampling
times. It turns out that the particular logic's "window of decision"
makes a huge difference as well as the logics recovery time and recovery
behavior when a transition occurs in that "window of decision". Of course
the reason that fairchild came out with this is that F parts have (supposedly)
very small windows of decision and quick recovery times. The times we usually
designed with had MTBF's that involved continous cycles for hundreds of years
before any chance of a glitch could happen and even then you can sometimes
design so that the glitch doesn't necessarily destroy the cycle. The real
solution and one that is very much overlooked is the use of asynchronous
state machines wherever possible. Most engineers do not realize the speed
and elegance of these little beasties. Of course there are problems associated
with them but if you spend the time to do them right, in my experience, they
are a *big* win. If you keep most of the VME interface logic asynchronous,
you pretty much avoid the "glitch" problem altogether. Most engineers brought
up on prom based sequenced state machines are blind to the obvious advantages
of the new fast pals and delay lines. The metastability problem has achieved
an *TTL-VOODOO* status. Lets demyth this right away.

jack@boring.UUCP (10/09/85)

Can anyone point me to articles on this matter?

I'll summarize, of course.	
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.