[comp.arch] Windowed register speeds: critical path?

mark@mips.UUCP (Mark G. Johnson) (11/10/87)
A question from jpdres10@usl-pc.UUCP (Green Eric Lee):
  > ... how much does the adder in the register path  slow things down
  > in the AMD29000? ...  And, of course, Plain Old Registers have no
  > problem at all with something out there in the register addressing
  > path, except, of course, the decode tree ...

and a respones from bcase@apple.UUCP (Brian Case)
  > Ok, so to address future speed advantages, yes there might be some
  > speed advantages for those with simple register files.  However, for
  > the Am29000, the critical paths were quite balanced ... with,
  > I believe, the TLB and/or instruction cache being the limiting factor.
  > Next came the ALU, and then the register file.  Unless you want to do
  > things like spread the ALU cost over two pipestages (possible to do),
  > I don't think the register file is going to be the limiting factor.
  > ... what do other people have to say?

At least in MOS implementations, I'd agree with Brian that register
file access will not be the speed-limiting path in future RISC chips,
for both "windowed" and "flat" register file architectures.  {I dunno
about Bipolar or MESFET implementations}.

A major reason: fast floating-point coprocessors.  If RISCs stick to their
current preference for synchronous instruction-stream-co-intrepreters,
then the list of potential critical paths now includes all paths on the
CPU *and* coprocessor(s), plus the generation/reception of the coprocessor
handshake signals.  For example, the double-precision fp ADD operation
(52 bit mantissa, 11 bit exponent) is required to complete in two
cycles in the MIPS fp chip.  Doing all of the normalization shifting,
exponent adjusting, mantissa addition, exception detection, and
the *%#$_@ IEEE rounding operations is "intuitively" :-) :-) more than
twice as bad as register file access, windows or not.

Ignoring coprocessors for a moment, I think it's likely that on-chip
TLB's will continue to be slower than register files.  Usually the TLB
contains many more bits than the register file, so its memory-array time
constants are longer.  TLB accesses also include some logical operations
not found in the register file: hit/miss detection, plus output
selection {for set-associative or fully-associative TLB's}.

Finally, the "circular definition" argument: If it's suspected that
register file access will be THE critical path, then the RISC design
team will begin by designing and optimizing (and re-optimizing) the
register file until it's as fast as that team of engineers can
possibly make it.  Now they know THE lower bound on the cycle time.
So they use this cycle time definition in designing the rest of the
chip, taking advantage of every last nanosecond wherever possible.
In some places, they'll make good use of these extra nanoseconds,
thus creating new "critical paths".  Meaning that the register file
is no longer THE critical path.

Regards,