[comp.arch] RS/6000 renaming

davec@nucleus.amd.com (Dave Christie) (07/26/90)

In article <37269@shemp.CS.UCLA.EDU> marc@oahu.cs.ucla.edu (Marc Tremblay) writes:
>
>A full cycle seems to be allocated for register renaming. That's plenty
>of time to access the map table and manage the tag lists. I suspect that
>they may even do it twice per cycle to reduce the number of ports
>of the map table. Indeed if two instructions can be renamed per cycle,
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>and if both are FMA (Fused multiply and add) which require 3 source
>tags and one destination tag, that's 8 ports/cycle for the mapping table.

Are you implying that two FP instructions can be issued in one cycle?!
I don't believe this is the case.

----------------------------------
Dave Christie          My opinions only.

mark@hubcap.clemson.edu (Mark Smotherman) (07/27/90)

From davec@nucleus.amd.com (Dave Christie):
> 
> Are you implying that two FP instructions can be issued in one cycle?!
> I don't believe this is the case.

Yes, the instruction dispatch on RS/6000 does not require a matched
pair of an integer instruction and a flt.pt. instruction in order to
dispatch (issue?) multiple instructions per cycle (as does the i860
in DIM and the i960CA).

From H. Bakoglu and T. Whiteside, "RISC System/6000 Hardware Overview,"
in IBM RISC System/6000 Technology, 1990, order no. SA23-2619, p. 11:

	Four instructions per cycle can be fetched from the I-cache
	arrays to the instruction buffers and dispatch unit, which
	can dispatch up to four instructions per cycle.  Two of these
	are internal dispatches to the ICU (branches and condition
	register instructions) and two are external dispatches to
-->	the FXU and FPU.  There is no restriction on the combination
-->	of instructions that are dispatched to the FXU and FPU.  They
-->	can be a fixed- and a floating-point instruction, or two
-->	fixed-point instructions, or two floating-point instructions.
	Because the fixed- and floating-point instructions are not mated
	together, instruction dispatch bandwidth or code space is not
	wasted.  [The FXU and FPU both have instruction buffers with
	12 entries each to even out the dispatching patterns.]

Very nice.
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark

billms@caen.engin.umich.edu (Bill Mangione-Smith) (07/27/90)

In article <9871@hubcap.clemson.edu> mark@hubcap.clemson.edu (Mark Smotherman) writes:
 >From davec@nucleus.amd.com (Dave Christie):
 >> 
 >> Are you implying that two FP instructions can be issued in one cycle?!
 >> I don't believe this is the case.
 >
 >Yes, the instruction dispatch on RS/6000 does not require a matched
 >pair of an integer instruction and a flt.pt. instruction in order to
 >dispatch (issue?) 

Nope, this is not quite true.  One FP instruction can be issued by the 
FPU each clock, though it can be a mult-add.  Two can be sent (i.e. dispatched)
to the FPU each clock, but atleast one of them sits there in a queue.  This
removes one more worry from the compiler about matching up instructions
for dispatching from the I cache unit.

>Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634

Bill Mangione-Smith
billms@eecs.umich.edu

marc@oahu.cs.ucla.edu (Marc Tremblay) (07/27/90)

>billms@caen.engin.umich.edu (Bill Mangione-Smith) writes:
>> mark@hubcap.clemson.edu (Mark Smotherman) writes:
>>From davec@nucleus.amd.com (Dave Christie):
>>> 
>>> Are you implying that two FP instructions can be issued in one cycle?!
>>> I don't believe this is the case.
>>
>>Yes, the instruction dispatch on RS/6000 does not require a matched
>>pair of an integer instruction and a flt.pt. instruction in order to
>>dispatch (issue?) 
>
>Nope, this is not quite true.  One FP instruction can be issued by the 
>FPU each clock, though it can be a mult-add.  Two can be sent (i.e. dispatched)
>to the FPU each clock, but atleast one of them sits there in a queue.  This
>removes one more worry from the compiler about matching up instructions
>for dispatching from the I cache unit.

The original discussion was on register renaming.
Yes,the RS/6000 can *rename* two instructions per cycle.
Yes,the RS/6000 can execute some combinations of two FP instructions in 1 cycle.
For example a floating-point load and a floating-point mult-add can be executed
in parallel (the fixed-point unit does most of the work anyway!).
If we talk about floating-point arithmetic instructions, no, the RS/6000
cannot execute two of them per cycle (most of them can be pipelined though).

Since the FPU can rename more instructions than it can execute, a buffer
must be inserted between the rename logic and the execution logic.
That's accomplished by the decode buffer.

_________________________________________________
Marc Tremblay
internet: marc@CS.UCLA.EDU
UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc

davec@nucleus.amd.com (Dave Christie) (07/27/90)

In <9871@hubcap.clemson.edu> mark@hubcap.clemson.edu (Mark Smotherman) writes:
>From davec@nucleus.amd.com (Dave Christie):
>> 
>> Are you implying that two FP instructions can be issued in one cycle?!
>> I don't believe this is the case.
>
>Yes, the instruction dispatch on RS/6000 does not require a matched
>pair of an integer instruction and a flt.pt. instruction in order to
>dispatch (issue?) multiple instructions per cycle (as does the i860
>in DIM and the i960CA).
>
>From H. Bakoglu and T. Whiteside, "RISC System/6000 Hardware Overview,"
>in IBM RISC System/6000 Technology, 1990, order no. SA23-2619, p. 11:
>
>	Four instructions per cycle can be fetched from the I-cache
>	arrays to the instruction buffers and dispatch unit, which
>	can dispatch up to four instructions per cycle.  Two of these
>	are internal dispatches to the ICU (branches and condition
>	register instructions) and two are external dispatches to
>-->	the FXU and FPU.  There is no restriction on the combination
>-->	of instructions that are dispatched to the FXU and FPU.  They
>-->	can be a fixed- and a floating-point instruction, or two
>-->	fixed-point instructions, or two floating-point instructions.
>	Because the fixed- and floating-point instructions are not mated
>	together, instruction dispatch bandwidth or code space is not
>	wasted.  [The FXU and FPU both have instruction buffers with
>	12 entries each to even out the dispatching patterns.]

I had assumed that there were just 32-bit paths from the ICU to the
FXU and FPU - looks like it's 64.  In any case, these feed the instruction
queues; all you've told me so far is that two instructions can be placed
in the FPU queue at once.  This relieves the compiler of having to do 
load levelling for dispatch (don't know what it has to do with code space
though...).  What I asked about was issue - can two be issued from the FPU
queue at once?  This would not just require double the mapping file ports,
but double the register file ports as well.  

One more question, assuming the answer to the previous one is "no":  is
the mapping file referenced when instructions are placed in the queue,
or upon issue?  I don't see much point in the former, considering the 
extra ports required.

Again assuming "no", I wonder if one could take their spending a 
non-trivial amount of pins (64) as an indication of how well they
thought the compilers could do load balancing, or an acknowledgement
that a lot of interesting codes just aren't amenable to load balancing
(I don't imagine it works in harmony with other optimizations).
Then again, maybe they were there for the taking anyway.

------------------------
Dave Christie             My opinions only.

usenet@mozart.amd.com (Usenet News) (07/27/90)

<1990Jul27.054936.18973@mozart.amd.com>
Sender:
Reply-To: davec@nucleus.amd.com (Dave Christie)
Followup-To:
Distribution: 
Organization: Advanced Micro Devices, Inc., Austin, Texas
Keywords: 
From: davec@nucleus.amd.com (Dave Christie)
Path: nucleus!davec

In <1990Jul27.054936.18973@mozart.amd.com> davec@nucleus.amd.com I write:
>
>One more question, assuming the answer to the previous one is "no":  is
>the mapping file referenced when instructions are placed in the queue,
>or upon issue?  I don't see much point in the former, considering the 
>extra ports required.

Brain damage alert: I've realized the renaming must be done at dispatch
in the ICU to coordinate the renaming of an FPU register by an FXU
instruction.  Yes, the mapping file would obviously have double the 
ports.  Never mind.  
------------------------
Dave Christie             My misconceptions only.