[comp.arch] 2 good reasons for doing self-timed logic

eric@snark.UUCP (Eric S. Raymond) (02/26/88)
   When I first read about the idea of asynchronous (aka self-timed) logic a
couple years back, I felt a powerful *rightness* about the idea -- nothing I
could pin down, but my design sense definitely stood up and salivated.
Recent discussion of the idea motivated me to sit down and think about my
initial reaction. As a result, I want to propose two powerful reasons that
I think development of a strong design tradition for async logic should start
now.

1. Silicon compilation

   The Holy Grail of silicon compilation is a tool suite that would permit
provably correct specification of an entire system in a high-level formalism,
then generate schematics and test vectors for the whole gazoo down to board
and chip-mask level.

   Whole classes of the hardest standard problems in system design reduce to
problems at the boundary between event-drivenness and clock-drivenness. Now,
suppose we could recast the system design problem as "construct a hierarchy of
event-driven state machines with specified semantics"?

   But this is exactly what async logic should allow us to do! Therefore:

   PROPOSED: Async logic and silicon compilation are made for each other,
not only naturally synergistic but *necessary* to each others' full
development.

2. Lightspeed limits

    Raymond's Law (which I just made up): the diameter of the largest
usable synchronous logic block is inversely proportional to process speed;
specifically, if S is clock speed and c is the velocity of light (with 
everything in compatible cgs units, natch)

	maxD = c/S.

We'll call maxD/2 the "lightspeed radius" corresponding to a given clock speed.

    Why? because of clock skew due to lightspeed delays, of course. As
process speed goes up the effects of non-simultaniety get harder to sweep
under the rug. When the maximum point-to-point delay exceeds your cycle time,
you're scrod.

    As a corollary, the maximum usable chip area is bounded by

	maxA = pi * (c / 2S)**2

and the number of usable synchronous gates by

	maxG = pi * (c / (2 * S * G))**2

where G is the lambda width of a gate (this is without allowing for connection
costs, which already dominate chip area at present densities; if we're really
lucky, maybe we get 30% of maxG).

   Assuming, for example, a 40MHz clock, (and using c = 2.99776 x 10**10 cm/sec
from my trusty CRC handbook) my calculator tells me the lightspeed radius works
out to about 374 cm, about the scale I'd have guessed. Three meters, not bad.

   But double that clock speed and the radius halves. Increase it by an order
of magnitude or three, as the GaAs people have already done for some mil-spec
stuff and the ballistic-transistor boys are talking about doing for everybody
in silicon land, and you're looking at trouble.

   The critical threshhold points seem to me to be 1GHz, lightspeed radius =
11.16cm (below standard wafer sizes) and 64GHz, lightspeed radius = 1.825mm
(approaching standard die sizes).

   Barring some breakthrough in quantum semiconductor devices, we're beginning
to see hard limits in areal density due to quantum effects. Maybe, if we're
lucky, we can get another factor four increase from CMOS (down to maybe a .5
micron feature size).

   Sooner or later (probably sooner) we're going to be designing circuits
with so many flippin' gates that they won't fit in pi*(R**2). Best guess to
when? Let's see...assuming Andrew Grove's

	bits/in**2 ~ 2**(year - 1962)

continues to be predictive, CMOS feature size will bottom out at the latest
in 1990-1991. After that, as clock speeds are pushed up, the amount of stuff
that can be fit inside the lightspeed radius has nowhere to go but down. I
don't know of any good predictor for merchant processor speeds but my guess
is that they double every three years or so (anybody have harder data?).

If this is true, we can expect to hit the wafer-size limit about 2003 and the
die-size limit around 2018.

And remember that these are process-independent figures. And that, if other
trend-curves are any guide, my estimate of processor speedup is probably far
too conservative.

Of course, we may have the aforementioned QSDs before then, and 3D VLSI
assembled by nanorobots provides another way out (you can pack a lot more
in a sphere, see various discussions of the "hairy smoking billiard ball"). But
lightspeed skew could still prove the nastiest stumbling block to conventional-
process wafer- and larger-scale integration, unless we develop design methods
that can systematically partition the system design into blocks (well, disks
actually) of radius less than maxD/2 without losing performance. Therefore:

   PROPOSED: a mature async-logic design tradition will become a medium-term
necessity in order to beat the lightspeed limit.

BTW, need I point out that we probably ain't gonna be designing such mega-
circuits by hand? and that the test-vector problem gets *really* hairy, so
we'd best have silicon compilation to provably-correct designs before we try...

[DISCLAIMER: I'm just an ex-mathematician software hacker, and therefore
	only-an-egg in the rude crude material world of hardware design. If
	there are glaring errors in the above or I'm just restating the obvious
	somebody please tell me...gently]

      Eric S. Raymond
      UUCP:  {{seismo,ihnp4,rutgers}!cbmvax,sdcrdcf!burdvax,vu-vlsi}!snark!eric
      Post:  22 South Warren Avenue, Malvern, PA 19355    Phone: (215)-296-5718
-- 
      Eric S. Raymond
      UUCP:  {{seismo,ihnp4,rutgers}!cbmvax,sdcrdcf!burdvax,vu-vlsi}!snark!eric
      Post:  22 South Warren Avenue, Malvern, PA 19355    Phone: (215)-296-5718