aglew@urbsdc.Urbana.Gould.COM (11/22/88)
>But what about doing microcode prefetch? I haven't seen anyone account >for that possibility in this particular question (RISC-vs.-CISC), though >I seem to recall it's been done on some machines (S-1?). > > Norm Sounds like someone's got a research topic. Tell us when you publish.
dkirk@k.gp.cs.cmu.edu (Dave Kirk) (11/22/88)
Distribution: usa References: <3290@ucdavis.ucdavis.edu> <7746@aw.sei.cmu.edu> <3634@pt.cs.cmu.edu> <sXW7FJy00VsEI1xUcS@andrew.cmu.edu> Organization: Carnegie-Mellon University, CS/RI In article <sXW7FJy00VsEI1xUcS@andrew.cmu.edu> bader+@andrew.cmu.edu (Miles Bader) writes: >> > >> >Now pass that microcode through a peephole optimiser, and trim it down. >> >If one operand is already in a microcode regitsr, don't move it there; >> ^^^^^^^^^^^^^^^^^^^ >> Any reasonable microcoder would have performed this optimization in his >> original code. If you are referring to your "new" expanded code, then >> please don't include it in your reduction costs over the original code. >But the two cases aren't the same; conventional microcode can't be optimized >with knowledge of the microcode due to previous (macro) instructions (because >they aren't known until run-time). If you were fetching inlined microcode >from ram, you could do this, and it probably would be an improvement >(how much? I dono...) over cisc microcode in rom. > Let me better explain my comment, and our micro architecture. Microcode registers look no different to the microcoder than macrocode registers. If there are 16 macro registers and 256 micro registers, the only difference in accessing the two is the actual address. Hence, the only time we ever "move anything to a register" is when the instruction says go to memory and get this operand. There is never a case when you move an operand from a macro register to a micro register, and THEN use it. As a result, advanced knowledge that a value may have been in a micro register during the previous macro instruction does not save any time or code. -Dave --
aglew@urbsdc.Urbana.Gould.COM (11/24/88)
>There's a definite appeal to having one-cycle instructions, but i think it's >mostly illusory. If an in-place complement takes less time than a three-operand >add-with-shift, they shouldn't be forced to take the same amount of time. In >other words, if most of your instructions take one cycle, your cycles are too >long. > >So what do y'all think about this? Are one-cycle instructions a good thing? > >--Joe Cycles are a bad thing! The universe is not discrete. All instructions should be self-timed, to precisely the length of time required to do the operation. :-)
pausv@smidefix.liu.se (Paul Svensson) (11/25/88)
In article <28200241@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: > >Cycles are a bad thing! The universe is not discrete. >All instructions should be self-timed, to precisely the length of >time required to do the operation. > >:-) Hear, hear! I couldn't resist following up on this one, since we actually have such a beast down in the basement. The FCPU (Flexible CPU), built by DataSAAB in the early seventies is completely asynchronous. The control units delivers instructions to various computation modules, including main memory, which then runs until ready. Communication between modules are through "validated registers" (queues of length one), because the control unit does not await completion of an instruction before starting the next one. It's a truly amazing machine. At the moment we're running a FORTH in the control unit only, with the rest of the machine, including main memory, powered off. But just wait 'til next week, when we've had cooling installed! :-) DataSAAB couldn't sell more then about half a dozen of it, I guess partly because they never used it to full capacity. They only used it to emulate their previous design, a conventional mid-sixties mainframe. :-( --- Paul Svensson psv@ida.liu.se
colwell@mfci.UUCP (Robert Colwell) (11/29/88)
In article <28200241@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: >>There's a definite appeal to having one-cycle instructions, but i think it's >>mostly illusory. If an in-place complement takes less time than a three-operand >>add-with-shift, they shouldn't be forced to take the same amount of time. In >>other words, if most of your instructions take one cycle, your cycles are too >>long. >>So what do y'all think about this? Are one-cycle instructions a good thing? > >Cycles are a bad thing! The universe is not discrete. >All instructions should be self-timed, to precisely the length of >time required to do the operation. > >:-) I left in your smiley, Andy, but it's not all that nutty. There is a certain appeal to making a machine where the clock cycle doesn't have to be "one-size-fits-all". Apart from the added complexity, the problem is that when you're juggling multiple CPU pipes, you have to either control each pipe individually in this fashion, or you control them all (allowing enough time in the current clock cycle for the slowest pending pipe stage to complete). That's a mess, and I doubt it would ever pay. Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colwell@multiflow.com 175 N. Main St. Branford, CT 06405 203-488-6090
rpw3@amdcad.AMD.COM (Rob Warnock) (12/02/88)
+--------------- | Cycles are a bad thing! The universe is not discrete. | All instructions should be self-timed, to precisely the length of | time required to do the operation. +--------------- Uh, ever look at a DEC PDP-10? (The original KA-10 CPU, circa 1967, based on the earlier DEC PDP-6 [1965?].) The internal implementation was exactly what you seem to be asking for: Each "time-state" was a pulse regenerator which strobed the results of the previous time-state to its target register, conditionally set or cleared various bits to choose what to do next (generally enabling operands onto the input busses of the ALU, or enabling result registers to accept the output of the ALU), and conditionally fed its output pulse into one of several delay lines. The specific delay line was chosen so that when the pulse can out the other end (and got regenerated as the next "time-state" pulse), the operation was done. There were hardware "subroutines", for example, "memory read cycle". For every potential "caller", there was one flip-flop. The caller pulse set that "return" flop, and also went into an inclusive-OR gate with all of the other "callers" pulses. At the bottom of the "subroutine", the last time pulse was fanned out to a bunch of AND gates, one per caller, whose other input was the "return-PC" ;-} ...that is, the flip-flop that had been set when the subroutine was called. The output wire of the selected AND gate fed back to the continuation point of the caller. There was no centralized "clock", nor were the delay lines bunched up in some centrtal place and shared. There was exactly one delay line for each event in the CPU. In other words, the "PC" of the microengine was expressed by which delay line the pulse was hiding in at any given time. (Think of the micro-PC as being in unary, rather than binary!) The "clock ticks" were those instants when the pulse could be seen between delay lines, as it got regenerated, when it could "do things", and get routed around before hopping into another delay line. In fact, the micro-PC could travel between cabinets. The memories, you see, were in external boxes (a whole 16k words each!), and during a memory-cycle subroutine the timing pulse travelled out to the selected memory module and ran the timing routines of the memory itself out there, and then travelled back to the CPU in the form of the "memory done" pulse, only to be routed to whichever part of the CPU which had called the memory-cycle subroutine. It was simply *lovely* to 'scope! The flow-charts of the instruction interpreter were virtually one-to-one with the delay lines and pulse regenerators of the hardware. And since the micro-PC was unary, it was simplicity itself to trigger an oscilloscope on any desired micro-step. (Ah! Nostalgia...) p.s. A later version of this technique -- called "Chartware" by DEC, for the ease from which you could go from the flowchart to the wiring diagram -- was used in the PDP-14 industrial controller modules (a sort of Tinker Toy build-your- own-CPU family -- there was never a general-purpose computer built out of it that I knwo of). It did use a centralized clock, but still had a unary micro-PC, and still left the timing of the operations to the various functional units. It used a scheme similar to the HP-IB (a.k.a. IEEE-488) bus. There was a common wired-OR "ready" line, and as the clock ticked the selected functional units pulled down on "ready" (made if false) until their operation was complete. The last one to let go allowed the clock to tick again, thus strobing the results in the the destination, and at the same time clocking the "PC" from its previous location to the next. The "PC" in this case was flip-flops instead of delay lines, and only one "control" flip-flop in the system should be set at a time. (A unary PC, again.) It might be better to say that the "PC" was a huge shift register, with loops and branches. John Alderman (founder of Dig. Comm. Assoc.) and I developed a still simpler version we called "synchronous chartware" (though it owed as much to the PDP-10 style as to the Chartware style), which was just a shift register (with loops and branches) driven from a single system clock, wherein the operations were timed by how many shift register stages (flip flops) lay between the one that started the operation and the one that used the result. Still, operations could be of different lengths, and even of variable lengths. (Long variable-length delays were implemented with a "while loop" which waited for the completion signal from the functional unit.) We found this design technique to be of great utility for things like magtape and disk controllers. (Cheap fast ROMs weren't yet available [circa 1971], nor was the now-common bit-slice microcode controller, e.g. the Am2911.) The technique, though for most uses hopelessly low-density by today's standards, still comes in handy for very-high-speed state machines with a lot of multi-way transition edges. Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403