david@daisy.UUCP (David Schachter) (02/16/88)
Here is a dumb question. Say I have a CPU where 99 percent of the instructions take, say, one clock. The remaining instructions need just a little longer-- one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- uting those instructions, instead of wasting most of a second clock period? -- David Schachter Opinions herein are an artifact of pulse dialing, not a carbon-based life form.
wesommer@athena.mit.edu (William E. Sommerfeld) (02/16/88)
In article <844@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: >Say I have a CPU where 99 percent of the instructions >take, say, one clock. The remaining instructions need just a little longer-- >one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- >uting those instructions, instead of wasting most of a second clock period? Rumour has it that the original Lisp Machines (the `CADR's) did just this; there were two clocks, and one bit of the microcode selected which clock would be used. The story goes that both clocks were hand-adjustable, and that special microcode diagnostics were used to tune each one (speed it up until it crashes, and then back off a 1/4 turn..) so each machine would run as fast as possible.. -- Bill Sommerfeld wesommer@athena.mit.edu
mac3n@babbage.acc.virginia.edu (Alex Colvin) (02/16/88)
I believe that the 64-bit microcode of the Honeywell Level6 = DPS6 also had a cycle width bit. Anyone from Bellerica care to confirm this?
baum@apple.UUCP (Allen J. Baum) (02/17/88)
-------- [] >In article <844@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: > >Here is a dumb question. Say I have a CPU where 99% of the instructions >take, say, one clock. The remaining instructions need just a little longer-- >one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- >uting those instructions, instead of wasting most of a second clock period? Not a dumb question. Lots of older microcoded minis did exactly this in their microcode. They had a control field to slow down the clock (from 150ns. to 180ns., for instance) when something slow came up, like a branch. I believe the first Prime was a machine that did this. This does complicate the world, especially synchronizing to the outside world. Its easier to just take a full cycle in the 1% cases. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
philip@amdcad.AMD.COM (Philip Freidin) (02/17/88)
In article <844@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: > >Here is a dumb question. Say I have a CPU where 99 percent of the instructions >take, say, one clock. The remaining instructions need just a little longer-- >one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- >uting those instructions, instead of wasting most of a second clock period? > > -- David Schachter > This is not a dumb question, because I have the answer! The technique you describe is a common one, and has been dealt with many times in various ways in different implementations of assorted architectures. The ratio of quick versus not so quick instructions though is not 99:1, but more in the 50:50 region (give or take 20%). The quick instructions are the logical ops, because there is no interbit communications. The not so quick are the primitive arith ops such as add, inc, sub, dec, etc, because the inter bit communications slow things down (the carry chain). Longer instructions use multiple cycles (and in some systems, a mix of long and short cycle instruc- tions). SALES MODE ON :-) The implementation of this variable period clock can be done with the AMD AM2925 clock generator chip. It is specifically designed to do exactly what you asked about. In a micro- programmed system, each micro cycle execution duration is a function of what the critical path will be for the specified opperation. This is known at assembly time of the microcode. An extra field is added to the microcode, which controls the Am2925, and thus sets the duration of the clock for that microcycle. The chip has a 3 bit control field, so it can generate 8 different clock periods. With wait states, this can be extended. SALES MODE OFF :-) :-) (double smily because I prefer to be out of sales mode) Philip Freidin @ AMD SUNYVALE on {favorite path!amdcad!philip) Section Manager of Product Planning for Microprogrammable Processors (you know.... all that 2900 stuff...) "We Plan Products; not lunches" (a quote from a group that has been standing around for an hour trying to decide where to go for lunch)
crabb@cadsys.dec.com (Charlie, SEG/CAD, HLO2-2/G13, (dtn 225)(617)568-5739) (02/17/88)
>>>>...The remaining instructions need just a little longer-- >>>>one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- >>>>uting those instructions, instead of wasting most of a second clock period? >>>Rumour has it that the original Lisp Machines (the `CADR's) did just >>>this; there were two clocks, and one bit of the microcode selected >>>which clock would be used. >>Not a dumb question. Lots of older microcoded minis did exactly this in their >>microcode. They had a control field to slow down the clock (from 150ns. to >>180ns., for instance) when something slow came up, like a branch. I believe >>the first Prime was a machine that did this. >The implementation of this variable period clock can be done with the >AMD AM2925 clock generator chip. I third the motion for non-dumbness. The PR1ME (note old logo :-) ) did indeed have a clock field in the microcode word to adjust the clock on a per-instruction granularity. Time was spent tuning the machine for various (macro) instruction times in the 4xx-7xx series (pre-pipeline era). /Charlie Crabb !decwrl!cadsys.dec.com!crabb
jesup@pawl20.pawl.rpi.edu (Randell E. Jesup) (02/17/88)
In article <20409@amdcad.AMD.COM> philip@amdcad.UUCP (Philip Freidin) writes: >The implementation of this variable period clock can be done with the >AMD AM2925 clock generator chip. >It is specifically designed to do exactly what you asked about. In a micro- >programmed system, each micro cycle execution duration is a function of >what the critical path will be for the specified operation. This is >known at assembly time of the microcode. An extra field is added to the >microcode, which controls the Am2925, and thus sets the duration of the >clock for that microcycle. The chip has a 3 bit control field, so it can >generate 8 different clock periods. With wait states, this can be extended. This will probably only work at relatively slow speeds. At higher clock rates, you will find that the inter-chip latency is magnitudes greater that the amount you want to adjust the clock by. If, in some micro-programmed CISC (yuch! :-), you wnat to stretch a cycle by 20%, you could just use a 4 or 5 times faster clock and take another cycle. In a RISC, unless it's awfully slow, you might as well take the extra cycle, because chip-edge capacitance slows things down so much. That is the real next hurdle that state of the art stuff has to beat. // Randell Jesup Lunge Software Development // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 \\// beowulf!lunge!jesup@steinmetz.UUCP (518) 272-2942 \/ (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup
lackey@Alliant.COM (Stan Lackey) (02/18/88)
>In article <844@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: >> >>Say I have a CPU where 99 percent of the instructions >>take, say, one clock. The remaining instructions need just a little longer-- >>one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- >>uting those instructions, instead of wasting most of a second clock period? Actually, I once heard a proposal to make a microprocessor totally ansynchronous, with logic added to determine when each stage of logic was complete, and use that to start the next stage. It would take advantage of the fact that an ALU might be done sooner when adding small numbers, and lots of times the numbers added are small (compared to the total size of the data path). "Self-timed" is what it was called. An interesting idea, but likely wouldn't work too well in a pipeline, and would be difficult to interface to. -Stan
vandys@hpindda.HP.COM (Andy Valencia) (02/18/88)
If you're going to do this, why not take it all the way and make your computer "event driven" instead of clocked? Then your computation can continue at the highest speed available from your components (gee, and you could even replace slow components with faster ones....) So an "add register to memory" would go like: Events Sequencer <Request register #x> <Request memory location #N> <Register x> <Mem N> ... pipline in: <Request next instr location> <Request ADD> <ADD done> <Request write memory location #N> ... <Next instr> ... start next instruction <Write done> Bunches of asynchronously executing components... I wonder what it would be like to diagnose the microcode :-<. Andy Valencia vandys%hpindda.UUCP@hplabs.hp.com
przemek@gondor.cs.psu.edu (Przemyslaw Klosowski) (02/18/88)
In article <8802162251.AA20090@decwrl.dec.com> crabb@cadsys.dec.com (Charlie, SEG/CAD, HLO2-2/G13, (dtn 225)(617)568-5739) writes: >>>Not a dumb question. Lots of older microcoded minis did exactly this in their >>>microcode. They had a control field to slow down the clock (from 150ns. to >>>180ns., for instance) when something slow came up, like a branch. I believe >>>the first Prime was a machine that did this. > /Charlie Crabb !decwrl!cadsys.dec.com!crabb Hey, I saw an old PDP (was it 8?) with a knob on the front panel, regulating the clock frequency! you are pressed for time? turn it clockwise! (probably at the expense of the error rate). I personally would rather implement it as a foot operated lever under the operator console... :^) przemek@psuvaxg.bitnet psuvax1!gondor!przemek
henry@utzoo.uucp (Henry Spencer) (02/19/88)
> ... The remaining instructions need just a little longer-- > one clock plus a few nanoseconds. Why not stretch the clock a bit when exec- > uting those instructions, instead of wasting most of a second clock period? Having several different clock periods used to be fairly routine in the days when minis were built from TTL. Some of the PDP11 family, for example, had three different clock periods selectable on a microcycle-by-microcycle basis. It's gotten less popular nowadays because everything tends to be in one chip that's invariably short of pins, and it's not as easy to just tap one or two bits of the microword to control the clock. It still can be done -- the Sun 3/100 series really does have 1.5 wait states for memory accesses, even though the 68020 has no notion of fractional wait states, because the clock generator knows about memory accesses and lengthens the cycle. -- Those who do not understand Unix are | Henry Spencer @ U of Toronto Zoology condemned to reinvent it, poorly. | {allegra,ihnp4,decvax,utai}!utzoo!henry
bjj@psueclb.BITNET (02/19/88)
> Hey, I saw an old PDP (was it 8?) with a knob on the front panel, regulating > the clock frequency! That was a PDP-10. KA10 processor. Sorry, that know only changes the speed front panel repeat function. Like when you turn on repeat, hit "deposit next", and have the CPU fill all of memory. Handy for memory tests when the CPU won't run anything. > If you're going to do this, why not take it all the way and > make your computer "event driven" instead of clocked? The KA10 is asynchronous. There really is no clock. All timing is determined by delay lines (and wire length). It has subroutines (like the memory cycle subroutine) which accept parameters (read and write). All done by sending the pulse off in various directions and gating the returning pulse with a flag to indicate who's waiting for the subroutine to finish. Very nice for doing things in parallel as you can have separate steams executing independently and wait for the last to finish. > Bunches of asynchronously executing components... I wonder > what it would be like to diagnose the microcode :-<. Who knows. The microcode works, why diagnose it? Fixing it is easy, just takes a scope. Occasionally our KA10 will have a pulse amplifier go bad resulting in a lost pulse. You just check the state of the machine (there are lights on nearly every register) to get a good idea where the pulse was lost.
jk3k+@andrew.cmu.edu (Joseph G. Keane) (02/19/88)
I've been thinking about making an asynchronous processor for a while. You need a lot of extra timing circuitry (i'd guess about double), but it mostly runs in parallel. I think eventually this idea will win out. You don't need any safety margin (`turn till it dies then back off a quarter turn'); the thing will always run as fast as possible. But can you imagine trying to benchmark the thing?! A couple weeks ago there was a talk here by someone who had apparently done just this. I'm kicking myself because i missed it. I suppose i could get a reference if anyone wants it. --Joe
grunwald@uiucdcsm.cs.uiuc.edu (02/19/88)
There is a recent tech report from CalTech discussing synthesis of self time circuits. CalTech has historically promoted the use of self-timed circuitry for reliability. Heretofore, the main problems have been design complexity. As an example, the AMRD (Async. Message Routing Device) of the Ametek 2010 machien uses a self-timed network. When I saw the machine, they had 1/2 the parts running at 8Mhz (I think) & the other half at 12Mhz. They wanted to get to 20Mhz eventually. However, the key point is that when you communicated between the 12Mhz parts, you ran at 12Mhz. The 12Mhz parts only ran at 8Mhz when there was an 8Mhz part in the chain. As long as the complexity is managable, it certainly appears to be a good design method.
baum@apple.UUCP (Allen J. Baum) (02/20/88)
-------- [] >In article <1232@alliant.Alliant.COM> lackey@alliant.UUCP (Stan Lackey)writes: > >Actually, I once heard a proposal to make a microprocessor totally >ansynchronous, with logic added to determine when each stage of logic was >complete, and use that to start the next stage. It would take advantage of >the fact that an ALU might be done sooner when adding small numbers, and lots >of times the numbers added are small (compared to the total size of the >data path). "Self-timed" is what it was called. > >An interesting idea, but likely wouldn't work too well in a pipeline, and >would be difficult to interface to. -Stan Machines like this have been built (e.g. the Illiac II), but none recently. Although people have talked about self-timed processors, I'm not aware of any that have been built. Pieces of microprocessors are sometimes self-timed, like register files and caches, but thats the extent of it. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
mitch@Stride.COM (Thomas Mitchell) (02/20/88)
In article <844@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: > >Here is a dumb question. Say I have a CPU where 99 percent of ^^^ Not dumb. >the instructions >take, say, one clock. The remaining instructions need just a little longer-- >one clock plus a few nanoseconds. Why not stretch the clock >a bit when executing those instructions, instead of wasting most >of a second clock period? David; That is the same question we asked when were selecting a clock rate for our 400 series custom MMU. We could have clocked that processor (MC68010) at 12 MHz but that would have required an extra state on each memory cycle. At 10MHz that extra state was not required. The result was the same throughput at 10MHz as 12MHz in most cases. The hard part is finding the knee in the curve. If the extra state instructions are used rarely then adding the state to all operations is a loss. If few but commonly used then it is a gain. Well thanks for the Soap Box,
kyriazis@pawl11.pawl.rpi.edu (George Kyriazis) (02/20/88)
In article <1232@alliant.Alliant.COM> lackey@alliant.UUCP (Stan Lackey) writes: >>In article <844@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: >>> >>>Say I have a CPU where 99 percent of the instructions >>>take, say, one clock. The remaining instructions need just a little longer- >>>one clock plus a few nanoseconds... > >Actually, I once heard a proposal to make a microprocessor totally >ansynchronous, with logic added to determine when each stage of logic was >complete, and use that to start the next stage. It would take advantage of >the fact that an ALU might be done sooner when adding small numbers, and lots >of times the numbers added are small (compared to the total size of the >data path). "Self-timed" is what it was called. Yes. There is a chapter in Mead-Conway's book 'Introduction to VLSI Systems' describing self-timed circuits. The concept is pretty interesting, since (for example) a circuit can be built using self-timed circuits and the interface can be built to communicate with normal clocke circuitry. It starts beeing interesting when you realize that if you want to make the chip (or the CPU) faster, you can just lower the temperature... No adjustable clocks, no nothing. > >An interesting idea, but likely wouldn't work too well in a pipeline, and >would be difficult to interface to. -Stan Yes, interfacing is more difficult, but there are standard ways to overcome the difficulty. The problem is elsewhere. Since there cannot be a BUS ENABLE signal for internal buses (you will have to wait for the last signal to change state before you toggle ENABLE), the only solution is to have 2 wires for every bit. One to say 'ok, I have a 1' and another one saying 'ok, I have a 0'. That doubles the amount of wires required for every datapath, thing that can easily lower the transistor density of the chip. Another interesting thing about self-timed circuits is that they look that they have a lot of things in common with the pronciples used in Dataflow computers, like 'This operation won't be executed unless I get results from "there" and "there"'. ******************************************************* *George C. Kyriazis * Gravity is a myth *userfe0e@mts.rpi.edu or userfe0e@rpitsmts.bitnet * \ / *Electrical and Computer Systems Engineering Dept. * \ / *Rensselear Polytechnic Institute, Troy, NY 12180 * || ******************************************************* Earth sucks.
aglew@ccvaxa.UUCP (02/22/88)
..> Variable clock rates (1) There's a company that sells a box for the VAX (I think 750, but not sure) that varies the clock rate according to processor activity. They say that they can get an extra 15% out of your old tired VAX. Not sure of details - just crossed this in some DEC magazine. (2) I've always liked the idea of self timed circuitry (note that this is an order of scale different from varying clock rates), but have a question that someone with more experience with self timed techniques can answer. Am I correct in saying that a trivial way of obtaining a self timed circuit is to take a "normal" circuit, say an adder, and put a timing circuit beside it that will produce a pulse when the adder is finished? And that there are "transformations" that will more closely intertwine the timing circuit with the function, so that they share gates? Doesn't this require extremely accurate parametrization of the device's performance, more than is required for non-self-timed systems?
rick@svedberg.bcm.tmc.edu (Richard H. Miller) (02/23/88)
In article <3297@psuvax1.psu.edu>, przemek@gondor.cs.psu.edu (Przemyslaw Klosowski) writes: > > Hey, I saw an old PDP (was it 8?) with a knob on the front panel, regulating > the clock frequency! you are pressed for time? turn it clockwise! (probably > at the expense of the error rate). I personally would rather implement it as > a foot operated lever under the operator console... :^) We have a clock speed switch (two actually, a course speed and fine speed pot) on the console of our KI-10. (PDP-10). The documentation indicates that the speed control is used only for maintenance. It is always kept in the fastest position during production. From reading the schematics of the processor, this switch is usually used to diagnosis clock problems or timing problems in the processor. Richard H. Miller Email: rick@svedberg.bcm.tmc.edu Head, System Support Voice: (713)799-4511 Baylor College of Medicine US Mail: One Baylor Plaza, 302H Houston, Texas 77030
cantrell@Alliant.COM (Paul Cantrell) (02/24/88)
In article <3297@psuvax1.psu.edu> przemek@gondor.cs.psu.edu (Przemyslaw Klosowski) writes: >Hey, I saw an old PDP (was it 8?) with a knob on the front panel, regulating >the clock frequency! you are pressed for time? turn it clockwise! (probably >at the expense of the error rate). I personally would rather implement it as >a foot operated lever under the operator console... :^) > przemek@psuvaxg.bitnet > psuvax1!gondor!przemek This was almost certainly a KA-10 processor, part of a DECSystem-10 computer system. The knob is used during debugging (of hardware or software) to control the speed of single-step operation, not the speed during normal operation. This was actually pretty nice if you were debugging a problem with the operating system. You could place the system in single step mode, crank the knob around to get desired speed of single step, and watch the lights on the processor and memory until you saw the condition you were looking for. The KA-10 was indeed an asynchronous machine (I always heard it referred to as a 'race' machine). The machine would run at different rates depending on environmental conditions, which memory things were being accessed from, etc. Different machines would run at different rates from each other. It made it difficult to do good benchmarks... PC
bobc%wings@Sun.COM (Bob Clark) (02/24/88)
In article <28200107@ccvaxa> aglew@ccvaxa.UUCP writes: > > Am I correct in saying that a trivial way of obtaining a self timed > circuit is to take a "normal" circuit, say an adder, and put a timing > circuit beside it that will produce a pulse when the adder is finished? > And that there are "transformations" that will more closely intertwine > the timing circuit with the function, so that they share gates? > Doesn't this require extremely accurate parametrization of the device's > performance, more than is required for non-self-timed systems? You have defined two approaches: 1) Take a functional module, characterize the worst case delay, and add a delay line in parallel with the function. Use the output of the delay line to determine that the function is complete. This is the trivial approach, and buys you nothing over standard synchronous design. It is a way of modifying a synchronous circuit to work withing an otherwise self-timed system. 2) Design an entirely new ciruit to implement the function, whose state changes are controlled in such a way that the final state change indicates completion of the function. This is the truly self-timed approach, and requires careful definition of the state changes. One approach is to design an asynchronous state machine, whose states are carefully designed so that only a single bit of the state code can change at a time. This requires no characterization of the circuit speed, and is referred to in the literature as the "one-hot" approach. An alternative is to assemble your macro self-timed circuit out of micro-self-timed modules, such as C-elements. As others have mentioned, Ivan Sutherland and others have been working in this area recently, and I would guess that some work has gone on sporadically since the early days of computing. It is possible to design circuits whose functional completion does not require parametrization of the device's perfromance. Bob Clark Sun Microsystems
blarson@skat.usc.edu (Bob Larson) (02/24/88)
In article <42976@sun.uucp> bobc@sun.UUCP (Bob Clark) writes: [In reply to <28200107@ccvaxa> aglew@ccvaxa.UUCP] >1) Take a functional module, characterize the worst case delay, and >add a delay line in parallel with the function. Use the output of >the delay line to determine that the function is complete. or >2) Design an entirely new ciruit to implement the function, whose >state changes are controlled in such a way that the final state change >indicates completion of the function. Why not something intermediate? Rather than having a fixed delay, have one that is a function of the inputs. An adder that has an extra output indicating that the maximum carry propigation will be N bits could be designed. (Probably sharing some gates with the look ahead carry generator.) The output may be stable earlier than predicted, (therefore wasting time) but it is still better than always waiting the worst case for any input, and possibly uses fewer gates that the fully determanistic "I tell you exactly when I'm ready" circuts. -- Bob Larson Arpa: Blarson@Ecla.Usc.Edu blarson@skat.usc.edu Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson Prime mailing list: info-prime-request%fns1@ecla.usc.edu oberon!fns1!info-prime-request
przemek@gondor.cs.psu.edu (Przemyslaw Klosowski) (02/27/88)
In article <1268@alliant.Alliant.COM> cantrell@alliant.UUCP (Paul Cantrell) writes: >In article <3297@psuvax1.psu.edu> przemek@gondor.cs.psu.edu (Przemyslaw Klosowski) writes: >>Hey, I saw an old PDP (was it 8?) with a knob on the front panel, regulating >>the clock frequency! you are pressed for time? turn it clockwise! (probably > >This was almost certainly a KA-10 processor, part of a DECSystem-10 > PC I went to our dungeon with defunct equipment and I found out that it was PDP15. Another machine that used this was a machine build by Polish pioneer of minicomputers, Karpinski, around 1965 (?); it was a contract for the physics department of the University of Warsaw. At this time they couldn't afford anything commercial, so they hired Karpinski. This machine still works, maintained by a dedicated engineer, even though they are getting some IBM PC that are comparable in computational power to it (there was a front page article in Wall Street Journal about the PC revolution in Poland). przemek@psuvaxg.bitnet psuvax1!gondor!przemek