[comp.arch] Multi-Ported Register File

marc@oahu.cs.ucla.edu (Marc Tremblay) (04/13/90)

	Recent chip designs have taken advantage of wide instruction
paths to fetch, decode and issue more than one instruction per cycle.
These chips need a multi-ported register file to sustain
the bandwith necessary to provide several operands per cycle.
For example the Intel 80960CA is advertised as having a 6-port
register file (though some ports are probably time-multiplexed).

	I have previously designed a standard dual-port static register
file (two simultaneous reads, one write per cycle) and I was wondering
if the same kind of circuits normally used for a simple register file
is used for a multi-ported register file.
For example I was wondering if a design with cross coupled inverters driving
the data lines through access transistors is still a valid choice.
Another interesting aspect is the design of the few decoders required
to access the ports. Some of these can be time-multiplexed but
that still requires a lot of area for the remaining decoders.

	There are other important factors such as the forwarding unit
and the load interlock circuitry associated with a multi-ported
register file. Briefly, there is a lot to discuss and besides
it is not a RISC vs. CISC topic!

					Marc Tremblay
					marc@CS.UCLA.EDU

upton@badger.cs.washington.edu (Michael Upton) (04/13/90)

In reguards to the design of multiported register files:
About 2 read ports is as many as can be used for standard cross-coupled
inverter ram cells. 
simultaneous reads on three or more ports to the same address results
in corrupting the data. The standard fix for this is to add another inverter
to the read side of the cross coupled inverters, thus decoupling the read
from the internal nodes of the ram cell.

Mike Upton

bron@bronze.wpd.sgi.com (Bron Campbell Nelson) (04/15/90)

In article <11426@june.cs.washington.edu>, upton@badger.cs.washington.edu (Michael Upton) writes:
> 
> In reguards to the design of multiported register files:

One thing I've wondered .. how much extra chip area does it take to
build a multi-port register file?  The late lamented MultiFlow VLIW
machine, and the new crop of "super-scalar" chips that issue several
instructions per clock must be able to read and write large numbers of
registers simultaneously (something on the order or 10 reads and 5 writes
per clock).  How much extra hardware is needed to do this?  How many more
levels of logic are required over the "2 read 1 write" case?

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.

mark@mips.COM (Mark G. Johnson) (04/15/90)

In article <56847@sgi.sgi.com> bron@bronze.wpd.sgi.com (Bron Campbell Nelson) writes:
    >> In reguards to the design of multiported register files:
    >
    >One thing I've wondered .. how much extra chip area does it take to
    >build a multi-port register file?  The late lamented MultiFlow VLIW
    >machine, and the new crop of "super-scalar" chips that issue several
    >instructions per clock must be able to read and write large numbers of
    >registers simultaneously (something on the order or 10 reads and 5
    >writes per clock).  How much extra hardware is needed to do this?

Consider, for a moment, the _hypothesis_ that superscalar CPUs require
many-many-ported register files, *and* physical implementation of
these additionally-ported files requires more hardware than the (2R,1W)
register files of olden (nonsuperscalar) days.  Just a hypothesis; it
may or may not be true in real life.  Wouldn't it be unpleasant if
you had to add this extra hardware to a Large register file, like
for example, one that had 7 or 8 windows of 16 regs per window?
A penalty multiplied by a penalty, it might seem.


:-) :-) of course, gate arrays ARE getting denser all the time ... :-) :-)
-- 
 -- Mark Johnson	
 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
	(408) 991-0208    mark@mips.com  {or ...!decwrl!mips!mark}