[comp.arch] 64bits: Area Overhead and Opportunity Costs

rtrauben@cortex.Eng.Sun.COM (Richard Trauben) (02/14/91)

In article <1991Feb12.225634.13757@m.cs.uiuc.edu> (Don Gillies) writes:
>I read an article today in "The Microprocessor Report" that said the
>increase from 32 to 64 bits added approximately 10-15% to the size of
>the chip.  This is quite surprising.  You'd think that going from 32
>to 64 bits would double the size of the ALU and all the data paths and
>registers.  Does this mean that the datapath and ALU and all the
>registers accounted for only 10-15% of the microprocessor to begin
>with?  What is the baseline design for this 10-15% increase in area?

At first blush, the projected 15% area overhead for increasing the datapath 
width from 32bits to 64bits DOES sound very low. However, consider the 
following 'typical' area budget characteristics for a highly integrated 
single chip processor:

Let 1/3 of the area budget be on-chip cache, 1/3 floating point unit and
1/3 be integer unit. A move to 64bits will have virtually no impact on
the fp engine. While the area for tags double, large data block caches 
(i.e. 32-64bytes/tag) make this increase in tag ram area neglectable.

Finally assume the integer unit is 1/2 datapath and 1/2 control. The
exact percentage will depend on your favorite processor architecture
and implementation.  A first order approximation of the area impact
is that the control section area stays constant while the datapath 
area doubles.

uP 		Fraction of Total	Effect 64-bit	
Sub-Block	32bit Area Budget	on 32bit Area
----------   	-----------------       -------------
FPU	 	.33			1.0x=>	.33
$	 	.33			1.1x=>	.36
IU	 	.33 ==>.5dpth->.16	2.0x=>	.32
	     	    ==>.5cntl->.16	1.0x=>	.16
	       -------------------	-----------
	       1.00A 			       1.17A 

This back of the envelope calculation says area would increase by 17%
by pasting a 64-bit IU core onto a generic design. Obviously the exact
mileage will vary somewhat. 

** OPINION **
It also suggests, for a fixed manufacturable die size with all other things 
being equal, that the quantitative cost to the customer for going to 64bit 
addressing in advance of truly needing it is analogous to not providing 1/2 
the capacity of on-chip cache that they might have had otherwise. A common
rule of thumb states that doubling the cache size will half the overall cache
miss penalty.

Simular arguments could be applied about the opportunity costs of
an integrated FP engine on the high integration processor. 

There are opportunity costs in each engineering tradeoff made. You only get 
what you pay for... sometimes less :-). 

Regards,
-Richard Trauben

carters@ajpo.sei.cmu.edu (Scott Carter) (02/20/91)

In article <7967@exodus.Eng.Sun.COM> rtrauben@cortex.Eng.Sun.COM (Richard Trauben) writes:
>In article <1991Feb12.225634.13757@m.cs.uiuc.edu> (Don Gillies) writes:
>>[ 15% chip size increase for the R4000 to fully support 64-bit integers ]
>[reasonable assuming only 1/6 of the chip is integer datapath, and that
>integer datapath roughly doubles in size]

Yeah, it's not so much the size [hmm, I'd still kill for an additional 10% size
budget], but I worry a bit about cycle time effects:

ALU carry path only increases by one gate, but in a cache-superpipelined(tm)
design  like the R4000 I fear this may often be "the" critical path.

Datapath mux width doubles, which increases the fanout on all the controls
which drive said muxes.  With MIPSCo's super designers, I don't doubt that
they'll be able to put mega-drivers on all the appropriate points, but for
architectures which need to be implementable in a less optimized design
environment (e.g. our military/space designs which cannot afford full
custom design on the random logic) I worry about this.  Probably the right
answer is design tools which can better handle the highly variable fanout
problem at a higher level.

Also ...
>Let 1/3 of the area budget be on-chip cache, 1/3 floating point unit and
>1/3 be integer unit. A move to 64bits will have virtually no impact on
>the fp engine. While the area for tags double, large data block caches 
>(i.e. 32-64bytes/tag) make this increase in tag ram area neglectable.
Tag size only doubles if you have virtual-address caches, otherwise tag size
depends on physical address size (I doubt the R4000 has a 64-bit physical
address :) 
Still, I worry about this for those design points where you want/need a
virtual address cache, where the design point would otherwise be a shorter
line (a branch target cache would be an excellent example, if someone starts
wanting 64-bit instruction addresses for some reason (e.g. shared library 
tricks) - small on-chip caches in lower density processes like GaAs 
constitue a less contrived example).

>-Richard Trauben

Scott Carter - McDonnell Douglas Electronic Systems Company
carter%csvax.decnet@mdcgwy.mdc.com (preferred and faster) - or -
carters@ajpo.sei.cmu.edu		 (714)-896-3097
The opinions expressed herein are solely those of the author, and are not
necessarily those of McDonnell Douglas.

mash@mips.COM (John Mashey) (02/21/91)

In article <7967@exodus.Eng.Sun.COM> rtrauben@cortex.Eng.Sun.COM (Richard Trauben) writes:
>In article <1991Feb12.225634.13757@m.cs.uiuc.edu> (Don Gillies) writes:
>>I read an article today in "The Microprocessor Report" that said the
>>increase from 32 to 64 bits added approximately 10-15% to the size of
>>the chip.  This is quite surprising.  You'd think that going from 32

>At first blush, the projected 15% area overhead for increasing the datapath 
>width from 32bits to 64bits DOES sound very low. However, consider the 
...
>This back of the envelope calculation says area would increase by 17%
>by pasting a 64-bit IU core onto a generic design. Obviously the exact
>mileage will vary somewhat. 
>It also suggests, for a fixed manufacturable die size with all other things 
>being equal, that the quantitative cost to the customer for going to 64bit 
>addressing in advance of truly needing it is analogous to not providing 1/2 
>the capacity of on-chip cache that they might have had otherwise. A common
>rule of thumb states that doubling the cache size will half the overall cache
>miss penalty.

This is a pretty good estimate.  I have some slightly better numbers,
although gathered informally (using a ruler on a chip plot, and
rounding everything).  I also split it up a little differently,
and my earlier estimates (which is where some of the published numbers
come from, i.e., some offhand comments) were actually a little high.
Take all of the following within +/- 2%.

The integer data path (+ some of its control; I wasn't too fussy)
are about 14%, and the MMU/CP0 chunk is about 5% (it has to be wider,
also).

***
Altogether, I expect this all means that the die space cost was around
7-8% to get 64-bit integer data path and addressing.
***

Now, the caches together are about 14% also, so maybe we could have
doubled one of the caches, which would have been nice.  On the other
hand, I think there would have been some awkward layout issues, i.e.,
it might have been possible, but would have been hard, especially
looking forward to a design that one can rapidly expand the cache sizes
without re-laying-out everything in sight.  (Note that caches like
certain shapes more than others, and are nontrivial to squeeze into
weird-shaped holes :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

cprice@mips.COM (Charlie Price) (02/24/91)

In article <757@ajpo.sei.cmu.edu> carter%csvax.decnet@mdcgwy.mdc.com writes:
>In article <7967@exodus.Eng.Sun.COM> rtrauben@cortex.Eng.Sun.COM (Richard Trauben) writes:
>Also ...
>>Let 1/3 of the area budget be on-chip cache, 1/3 floating point unit and
>>1/3 be integer unit. A move to 64bits will have virtually no impact on
>>the fp engine. While the area for tags double, large data block caches 
>>(i.e. 32-64bytes/tag) make this increase in tag ram area neglectable.

>Tag size only doubles if you have virtual-address caches, otherwise tag size
>depends on physical address size (I doubt the R4000 has a 64-bit physical
>address :) 
>Still, I worry about this for those design points where you want/need a
>virtual address cache, where the design point would otherwise be a shorter
>line (a branch target cache would be an excellent example, if someone starts
>wanting 64-bit instruction addresses for some reason (e.g. shared library 
>tricks) - small on-chip caches in lower density processes like GaAs 
>constitue a less contrived example).

What you say is right.

Just to keep the confusion factor down here,
the on-chip primary caches in the R4000 are

 virtual index		(access is based on the virtual address)

but

 physical tag		(deciding on a hit/miss is based on the phys addr)

You overlap the virtual-to-physical translation with the cache access.
The secondary cache is physical/physical.
-- 
Charlie Price    cprice@mips.mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA   94086-23650