[comp.arch] Metastability

josh@klaatu.rutgers.edu (J Storrs Hall) (04/27/89)

One commonly reads in articles about arbiter circuits that
it has been proven that the problem of metastability cannot
be completely avoided.  Is there an actual proof anywhere?

--JoSH

rpw3@amdcad.AMD.COM (Rob Warnock) (04/27/89)

josh@klaatu.rutgers.edu (J Storrs Hall) writes:
+---------------
| One commonly reads in articles about arbiter circuits that
| it has been proven that the problem of metastability cannot
| be completely avoided.  Is there an actual proof anywhere?
| --JoSH
+---------------

The way I heard it, just as momentum & position are "conjugate" quantities
subject to Heisenberg Uncertainty limits if you try to measure both at the
same time, so are energy & time. A synchronizer tries to measure with absolute
precision whether an energy (the "AND" of data and clock, typically) is above
or below a threshold, and tries to do the measurement in a finite time. You
can't do both. So that's the impossibility, at some very fundamental level.
But most real synchronizers have failure rates far worse than the Heisenberg
limit...

Most of the circuits I've seen that claimed to "solve" the sychronizer
problem either (1) simply pushed the energy-threshold decision around
to a part of the circuit where you normally wouldn't think to look for
it ("solution" by sweeping under the rug -- but the dirt's still there),
or (2) "hide" the fact that they can delay making a decision for a while
in some cases.

The real trick to making a good (not perfect) synchronizer is getting
a latch stage with a very high gain-bandwidth product "around the loop".
This shows up as a small "rho" parameter in the MTBF equation. There are
published papers (especially the one by Tom Cheney, at U Wash.) which
give measured parameters of "delta" (a.k.a. "t0") and "rho" for various
commercial parts. (The Fairchild 74F74 & 74F374 and AMD Am298xx parts
are pretty good. I generally don't use anything else.)

You can also make a two-stage synchronizer, where the second stage can
fail only if the first stage comes out of metastable just as the second
stage clocks. [You have to bias the output of the first stage so a first-
stage metastable looks like a clean one or zero to the second stage.]
Depending on your environment, this is sometimes better than clocking
a single-stage synchronizer at 1/2 the rate.


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

afscian@violet.waterloo.edu (Anthony Scian) (04/27/89)

In article <Apr.26.15.07.22.1989.3661@klaatu.rutgers.edu> josh@klaatu.rutgers.edu (J Storrs Hall) writes:
>One commonly reads in articles about arbiter circuits that
>it has been proven that the problem of metastability cannot
>be completely avoided.  Is there an actual proof anywhere?

Yes, but I'm not sure where it was proven; try these references:

T.J.Chaney, A Comprehensize Bibliography on Synchronizers and Arbiters
Technical Memorandum # 306C, Institute for Biomedical Computing,
Washington University, St.Louis

T.J.Chaney, C.E.Molnar, Anomalous Behaviour of Synchronizer and Arbiter Circuits
IEEE Transactions on Computers, Vol. C-22 (1973), pp. 421-422

L.R.Morino, General Theory of Metastable Operation
IEE Transactions on Computers, Vol. C-30, 2 (1981), pp. 107-115

Most of the work is being done at Washington University, you should
contact either Chaney or Molnar for further information.

Anthony
//// Anthony Scian afscian@violet.uwaterloo.ca afscian@violet.waterloo.edu ////
"I can't believe the news today, I can't close my eyes and make it go away" -U2

segall@caip.rutgers.edu (Ed Segall) (04/28/89)

rpw3@amdcad.UUCP writes:
> A synchronizer tries to measure with absolute
> precision whether an energy (the "AND" of data and clock, typically) is above
> or below a threshold, and tries to do the measurement in a finite time. You
> can't do both. So that's the impossibility, at some very fundamental level.
> But most real synchronizers have failure rates far worse than the Heisenberg
> limit...

From your description, this doesn't seem to prove that metastability
is necessary.  If the state of a line is 0, and it asynchronously
changes to 1, a carefully designed synchronizer wouldn't mind if the
transition isn't noticed on the first succeeding clock edge.  Rather,
it would want either a clean 0 or a clean 1.  If the line stays 1, it
would definitely want to see a clean 1 by the next edge.  Notice that
a consequence of this is that asynchronous pulses shorter than one
clock period are not guaranteed to be noticed.  Of course, if you want
to catch short pulses, you would put a pulse catcher in front of the
synchronizer.

What your 'uncertainty' explanation seems to imply is that it is
impossible for a synchronizer to always give the right answer - e.g.
that it might say zero when it should say 1.  This would seem to be a
fatal flaw unless you confine the errors to be on transitions only (as
I explained above).  I think most systems can be designed to handle a
one-cycle delay in noticing _valid_ transitions.  What they can't
handle is invalid logic levels, which may result from metastable
states.

Of course, since I haven't read the paper you referenced, I can't
tell if it covers this situation.  Would you post more detailed
references?  Is the Cheney paper you refer to actually:
Chaney, T. J., Littlefield, W.M., and Ornstein, S.M. "Beware the
Synchronizer," in Digest of Papers of the Sixth Annual IEEE Computer
Society International "Conference, San Francisco, CA Sept 1972?  It's
referenced in Fletcher.

If anyone out there would like a simple explanation of why
metastable states occur when flip-flops are used as synchronizers,
see:

Hodges and Jackson, "Analysis and Design of Digital Integrated
Circuits," McGraw-Hill 1983

--Ed
-- 

uucp:   {...}!rutgers!caip.rutgers.edu!segall
arpa:   segall@caip.rutgers.edu

henry@amdcad.AMD.COM (Henry Choy) (04/28/89)

In article <25423@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes:
>josh@klaatu.rutgers.edu (J Storrs Hall) writes:
>+---------------
>| One commonly reads in articles about arbiter circuits that
>| it has been proven that the problem of metastability cannot
>| be completely avoided.  Is there an actual proof anywhere?
>| --JoSH
>+---------------
>
>The real trick to making a good (not perfect) synchronizer is getting
>a latch stage with a very high gain-bandwidth product "around the loop".

The GBP also depends heavily on the capacitive loading at the output of
the loop.  If the synchronizer is designed on chip, I'll try to minimize
the caps (diffusion, interconnects, fanout) by using a small output
driver, and not let anything else to touch the cross-coupled nodes.

>You can also make a two-stage synchronizer, where the second stage can
>fail only if the first stage comes out of metastable just as the second
>stage clocks. [You have to bias the output of the first stage so a first-
>stage metastable looks like a clean one or zero to the second stage.]
>Depending on your environment, this is sometimes better than clocking
>a single-stage synchronizer at 1/2 the rate.
>
My feeling is that the propagation delay of the second stage
significantly reduce the available time for synchronization.  So
instead of having a probability fo failure (with respect to the
available time) Pr(T-Tprop), the two-stage synchronizer has a
probability approximately Pr(T/2 - Tprop)**2.  But Pr(t) is exponential
      
             -t/tau
   Pr(t) = Ke            (tau = sqrt(1/GB))
                                      +T/2tau
   Pr(T/2 - Tprop) = Pr(T - Tprop) * e
                     2
So, (Pr(T/2 - Tprop))      Tprop/tau
    ------------------ = Ke
      Pr(T - Tprop)

Typically, Tprop >> tau and K is O(1).
Just my opinion, Rob.
>
>Rob Warnock
>Systems Architecture Consultant
>
>UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
>DDD:	  (415)572-2607
>USPS:	  627 26th Ave, San Mateo, CA  94403
>

A paper by L. Marino (IEEE Trans. on Comp., Feb.81) and another
by Lindsay Kleeman & A. Cantoni (same, Jan. 87) has proofs on
metastable problems.

Several papers on metastability (all published in JSSC) deserve credits:

Veendrick, 4'80,
Flannagan, 8'85,
Sakurai,   8'88

Henry Choy
Advanced Micro Devices, Inc.
henry@amdcad.AMD.com

jjb@sequent.UUCP (Jeff Berkowitz) (04/28/89)

In article <25423@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes:
>
>You can also make a two-stage synchronizer, where the second stage can
>fail only if the first stage comes out of metastable just as the second
>stage clocks. [You have to bias the output of the first stage so a first-
>stage metastable looks like a clean one or zero to the second stage.]
>Depending on your environment, this is sometimes better than clocking
>a single-stage synchronizer at 1/2 the rate.
>

I believe that Digital used to take all 74S74 parts (back when
they were an important part of real CPUs) and run a very precise
test on (Tsetup + Thold) - the actual length "window" surrounding
the positive going clock edge during which data had to be stable
in order to get a clean "1" on Q in a guaranteed (short) time.

Real parts were quite variable with respect to this parameter.
(The spec for the 74S74 is 3ns setup, 2ns hold from my TTL data book -
supposedly they found that some parts were three orders of magnitude
better, in the area of a few picoseconds).  The ones that were
particularly good were given a special part number and were then
used a the second stage of the circuit described above.  This
served as a practical engineering approach to minimizing the
failure rate.

What do current VLSI designers do to minimize the likelyhood of
metastable failure?

-- 
Jeff Berkowitz N6QOM			uunet!sequent!jjb
Sequent Computer Systems		Custom Systems Group

henry@amdcad.AMD.COM (Henry Choy) (04/29/89)

In article <Apr.27.15.59.55.1989.4457@caip.rutgers.edu> segall@caip.rutgers.edu (Ed Segall) writes:
>
>From your description, this doesn't seem to prove that metastability
>is necessary.  If the state of a line is 0, and it asynchronously
>changes to 1, a carefully designed synchronizer wouldn't mind if the
>transition isn't noticed on the first succeeding clock edge.  Rather,
>it would want either a clean 0 or a clean 1.  If the line stays 1, it
>would definitely want to see a clean 1 by the next edge.  Notice that
But even a carefully designed synchronizer would have a FINITE
probability of failure, even if it is 10e-10.  Those transitions that
occur when the synchronizers samples the line can ALWAYS happen.

>fatal flaw unless you confine the errors to be on transitions only (as
>I explained above).  I think most systems can be designed to handle a
>one-cycle delay in noticing _valid_ transitions.  What they can't
Then again not all systems can handle a one-cycle delay.  And even if
you can use a one cycle delay, the cycle time is reducing over the years
(20MHz -> 33M -> 45M -> ???) and will continue to reduce.  Sooner or
later the cycle time is not going to worth much with a relative slow
synchronizer.
>
>--Ed
>uucp:   {...}!rutgers!caip.rutgers.edu!segall
>arpa:   segall@caip.rutgers.edu

Henry Choy
Advanced Micro Devices, Inc.

Disclaimer: I do not represent the company on the net.