[comp.arch] Tomasulo

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (01/29/91)

In article <1991Jan23.154727.26972@mozart.amd.com> 
	tim@amd.com (Tim Olson) writes:
>Register file access is not on the most critical path in the Am29000
>(and we have 192 3-ported registers!).  Usually cache lookups and TLB
>matching tend to be more critical, because they involve an immediate
>comparison of tags after the access, and the arrays are usually larger
>than register file arrays.

Does this mean that a machine with Tomasulo-style tag matching would
have no cycle time penalty?  How sensitive would that be to the tag
width, when there are, say, 50 or 80 destinations?

[I am referring to a machine where tagged values are broadcast over
an internal bus, and destinations select themselves via tag matching.
The 360/91 FPU used this in place of traditional register address
decoders.]
-- 
Don		D.C.Lindsay .. temporarily at Carnegie Mellon Robotics

tim@proton.amd.com (Tim Olson) (01/31/91)

In article <11705@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
| 
| In article <1991Jan23.154727.26972@mozart.amd.com> 
| 	tim@amd.com (I) write:
| >Register file access is not on the most critical path in the Am29000
| >(and we have 192 3-ported registers!).  Usually cache lookups and TLB
| >matching tend to be more critical, because they involve an immediate
| >comparison of tags after the access, and the arrays are usually larger
| >than register file arrays.
| 
| Does this mean that a machine with Tomasulo-style tag matching would
| have no cycle time penalty?  How sensitive would that be to the tag
| width, when there are, say, 50 or 80 destinations?

Probably.  Instruction and data caches are typically 1, 2, or 4-way
set-associative, so they still involve a significant access time for
each "way" before the tag compare (a 2-way set-associative, 64KB cache
with a 4-word line size has to look up one of 2048 entries), but
Tomasulo-style result-tagging is fully-associative in that all of the
reservation station entries are compared to the result value(s) in
parallel for a tag match, and therefore don't have the large
access-time component that the caches have.  There may be some time
penalty involved in the distribution of the result tags to all
reservation station entries, however, which may cause it to become one
of the critical paths.


--
	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)