[comp.arch] Totally asynchronous computers

zik@bruce.cs.monash.OZ.AU (Michael Saleeba) (01/02/91)

When I first started learning about computer architecture one thing struck
me as an incredibly obvious improvement - asynchronism. At that time I
was messing around with 6800s and such things where one clock cycle was
the same as one bus cycle. This seemed pretty silly since some operations
didn't even use the bus, yet they still had to hang around for an entire
microsecond. It seemed that a sensible architecture would only wait for as
long as was warranted, not some arbitrary time.

When the 68k came along I was pleased to see that it had provision for
asynchrnous bus timing. Essentially this was due to the processor clock being
so much faster than the usual bus cycle. Even so, most designs used
synchronous circuits to simplify design. And these days asynchronism has
basically gone by the wayside in favour of things like burst modes.

Still, the basic concept still applies. Why not design a processor without
any sort of clock at all? A processor which is based on the thought that
you shouldn't have to wait any longer than absolutely necessary for _anything_.
So your ALU would have x inputs, y outputs, and also timing inputs and outputs
which wait until all operands became available, and then cause a delay of only
the minimum length of time necessary to complete that particular operation.
An entire CPU based on this system would be pretty complex, but surely
todays's >1 million transistor pipelined, cached, etc devices would exceed
this complexity greatly.

Take this a step further and you find yourself in wierd territory. What if you
wait only as long as it takes for outputs to settle, rather than waiting for
the rated delay of the device? In this way you could have chips rated
on their individual ability, rather than lumped into x-MHz categories. If
your 100ns RAM responded in an average of 85ns, you'd reap the benefit! And
your machine wouldn't crash on the odd occasion when things took longer. Of
course this idea had quite a few problems... but it'd be exciting!

It'd be really nice to be able to accelerate your machine by just popping in
a faster processor or faster RAM and watching things just happen faster without
any extra twiddles.

Now I'm aware of quite a few reasons why totally asynchronous machines haven't
been made much, but I can think of work-arounds to nearly all of them. Would 
anyone like to offer a concrete reason why this system is so little used? Or
mention some machines that have used similar systems?

------------------------------------------------------------------------------
 Michael Saleeba - sortof postgraduate student - Monash University, Australia
			zik@bruce.cs.monash.edu.au

rauletta@gmuvax2.gmu.edu (R. J. Auletta) (01/03/91)

In article <3523@bruce.cs.monash.OZ.AU> zik@bruce.cs.monash.OZ.AU (Michael Saleeba) writes:
>When I first started learning about computer architecture one thing struck
>me as an incredibly obvious improvement - asynchronism. At that time I

 You may wish to look at the paper "The Design of an Asynchronous
Microprocessor" by A. Martin, S. Burns, et.al. in
Advanced Research in VLSI, Proceedings of the Decennial Caltech
Conference on VLSI, March 1989 or "The Design of a Delay-Insensitive
uP: An example of Circuit Synthesis by Program Transformation" by
A. Martin in  Hardware Specification, Verification and Synthesis:
Mathematical Aspects, Lecture Notes in Computer Science Vol. 408.

--R J Auletta

kyriazis@iear.arts.rpi.edu (George Kyriazis) (01/03/91)

In article <3523@bruce.cs.monash.OZ.AU> zik@bruce.cs.monash.OZ.AU (Michael Saleeba) writes:
>Still, the basic concept still applies. Why not design a processor without
>any sort of clock at all? 
>
This is already done at Caltech.  There is a team (I don't remember names,
but I can give the reference later) that produced a microprocessor made
out of asunchronous logic.  It worked quite fast, and when you cooled it
down, it worked even faster.  It was built from the top down, using 
communicating processes to simulate the different design blocks and 
message exchange between them.  They claim it was a successful design.

>Now I'm aware of quite a few reasons why totally asynchronous machines haven't
>been made much, but I can think of work-arounds to nearly all of them. Would 
>anyone like to offer a concrete reason why this system is so little used? Or
>mention some machines that have used similar systems?
>
Such systems are a pain to design and eat up a lot of wire area.  You don't
only have to say that is the value of a signal, but also if the signal has
arrived or not.  This is usually done by 2-rail logic and/or some additional
2-cycle or 4-cycle handshaking protocols.  There is a chapter is Mead &
Conway's book "Introduction to VLSI Systems", that is devoted to asynchronous
VLSI circuits.  You might want to take a look.

Disclaimer:  I don't know too much about asynchronous circuits, but I know
the above were true a while back (1 to 2 years).  Things might have changed
by now.


----------------------------------------------------------------------
  George Kyriazis                 kyriazis@rdrc.rpi.edu
 				  kyriazis@iear.arts.rpi.edu

petera@chook.adelaide.edu.au (Peter Ashenden) (01/03/91)

Another interesting reference on the topic is Ivan Sutherland's
Turing Award Lecture: "Micropipelines", Communications of the ACM,
Vol 32 No 6 (June 1989) pp 720-738.

Peter A

naumann@autarch.acsu.buffalo.edu (Dirk Naumann) (01/04/91)

There are a number of people working on this subject right now.
There is a paper from the university of utah at Salt Lake City, which
does a pretty good survey of available techniques, including A. Martins
approach to Self-Timed Circuits. If you are interested I can send you a 
the complete references.



-- 
Dirk Naumann
naumann@eng.buffalo.edu, ECE Department, SUNY at Buffalo

naumann@autarch.acsu.buffalo.edu (Dirk Naumann) (01/04/91)

Although it is right to say that totally asynchronous systems are
hard to design, this can not be said about self timed circuits.
In conventional asynchronous design you have to deal with races and 
hazards, which make designing quite a experience.
This is different in self timed designs like the ones designed
by Martin et al. Races and hazards are eliminated by design, since
due to the four phase handshaking two signals never change their value
at the same time.
Another advantage, which comes to my mind is that this method of
designing self timed circuits can be automated.
Ada is also bassed on message passing to communicate with parallel
processes. Couldn't it be possible to describe self timed circuits
in ada. Something like this has been done be Sutherland and ???.
They were using INMOS occam to describe an asynchronous (self-timed?)
circuit.


-- 
Dirk Naumann
naumann@eng.buffalo.edu, ECE Department, SUNY at Buffalo

daveh@cbmvax.commodore.com (Dave Haynie) (01/08/91)

In article <3523@bruce.cs.monash.OZ.AU> zik@bruce.cs.monash.OZ.AU (Michael Saleeba) writes:
>When I first started learning about computer architecture one thing struck
>me as an incredibly obvious improvement - asynchronism. At that time I
>was messing around with 6800s and such things where one clock cycle was
>the same as one bus cycle. This seemed pretty silly since some operations
>didn't even use the bus, yet they still had to hang around for an entire
>microsecond. It seemed that a sensible architecture would only wait for as
>long as was warranted, not some arbitrary time.

Well, in lots of systems, these kinds of chips were driven in a somewhat
asynchonous manner.  In several 6502 family systems I worked on (same bus
interface as a 6800 for the most part), we played games with the CPU clock.
Basically, it would run ordinary cycles at the fastest clock speeds around,
stretching part of the system as necessary to deal with slower devices.  On
such systems, real wait states were both hard to add and inefficient, so 
this was always the best approach.  Modern CPUs are generally too dynamic
to deal with clock stretching, and it's a bad idea anyway, since with big
pipelines, you may have six things happening at one, only one of which is an
external bus cycle.

>When the 68k came along I was pleased to see that it had provision for
>asynchrnous bus timing. Essentially this was due to the processor clock being
>so much faster than the usual bus cycle. Even so, most designs used
>synchronous circuits to simplify design. And these days asynchronism has
>basically gone by the wayside in favour of things like burst modes.

>Still, the basic concept still applies. Why not design a processor without
>any sort of clock at all? A processor which is based on the thought that
>you shouldn't have to wait any longer than absolutely necessary for _anything_.

It might be kind of difficult for a processor to work this way.  The hardest
part of an asynchronous system is generating the timings -- delay lines are
easy to buy for a systems design, but did you ever try to build a really tight
one, on a chip.  Whereas, a clock is easy to build timers, state machines, etc.
with.

I think asynchronous systems certainly have their place, though.  In fact, the
Amiga 3000 expansion bus protocol (called the "Zorro III" bus) I designed is
completely asynchronous.  A bus master and bus slave negotiate various phases
of a bus cycle with different strobe lines.  So a bus cycle need only take as
long as the fastest bus master/slave combination, but if slower devices are on
the bus, the cycles are naturally extended.  Of course, reality sets in when 
you have to put something synchronous, like a 68030, on as bus master.  The 
theoretical maximum speed of the bus is 50 MB/s (excluding burst cycles), the 
maximum speed the current 68030 bus-master implementation can hit is around
20 MB/s.  Though many I/O and memory devices work very naturally as 
asynchronous slave devices.

>It'd be really nice to be able to accelerate your machine by just popping in
>a faster processor or faster RAM and watching things just happen faster without
>any extra twiddles.

In theory, that's what happens on a Zorro III bus.  If you add a faster bus
master, any memory boards capable of going faster do so.  Then you add faster
memory, and jump up again.  Of course, if the bus master is a 68030 or similar
synchronous device, things get quantizied into 68030 cycles -- you'll run in
40ns quanta on the current 25MHz system, though that's just an implementation
detail, not part of the bus specification, and a different bus master might
work much closer to the asynchronous ideal.

> Michael Saleeba - sortof postgraduate student - Monash University, Australia
>			zik@bruce.cs.monash.edu.au

-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"Don't worry, 'bout a thing. 'Cause every little thing, 
	 gonna be alright"		-Bob Marley

henry@zoo.toronto.edu (Henry Spencer) (01/09/91)

In article <17214@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
>... In several 6502 family systems I worked on (same bus
>interface as a 6800 for the most part), we played games with the CPU clock.
>Basically, it would run ordinary cycles at the fastest clock speeds around,
>stretching part of the system as necessary...

Not unknown even in (somewhat) more modern machines.  The early Sun 3's
did 68020 memory accesses with 1.5, rather than 2, wait states by a similar
trick.  (Folks who were there have said, roughly, "we could have gotten it
down to 1 if we'd really tried, but after years of building machines that
were on the ragged edge of timing specs, there was considerable interest
in a more robust design that would be easier to build in quantity".)
-- 
If the Space Shuttle was the answer,   | Henry Spencer at U of Toronto Zoology
what was the question?                 |  henry@zoo.toronto.edu   utzoo!henry