[comp.arch] 64 bit sparc ship sets

raob@mullian.ee.mu.OZ.AU (richard oxbrow) (10/18/90)

	Does somebody have some detailed information about the 
	64bit sparc chips that they might like to post.

	More specifically the

	Matsushita  MN10501 (?) 64 bit sparc cpu with fpu,mmu and cache.	

	[The blurb that i am currently looking at says that it has an
	8k cache on board (6k instruction and 2k data).]


	richard ..
richard oxbrow			   |internet    raob@mullian.ee.mu.OZ.AU
ee eng,  uni of melbourne          |uunet       ..!uunet!munnari!mullian!raob
parkville 3052           	   |fax         +[613] 344 6678   	   
australia               	   |phone       +[613] 344 6782

mslater@cup.portal.com (Michael Z Slater) (10/18/90)

>	Does somebody have some detailed information about the 
>	64bit sparc chips that they might like to post.
>
>	More specifically the
>
>	Matsushita  MN10501 (?) 64 bit sparc cpu with fpu,mmu and cache.	

The Matsushita chip is used in the recently announced Solbourne workstation,
and is not available to other vendors.  There are some indications it might
be put on the market as a chip; anyone have any knowledge of Solbourne/
Matsushita's plans in this regard?

This is the only SPARC chip now in production that has on-chip cache, mmu,
or floating-point -- and it has all three.

Note that this is 64-bit only in that it has 64-bit buses; it is NOT a
64-bit extension to the architecture.  That won't come for a year or more.
(Probably more -- maybe several years.)

Michael Slater, Microprocessor Report  mslater@cup.portal.com

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/18/90)

In article <5791@munnari.oz.au> raob@mullian.ee.mu.OZ.AU (richard oxbrow) writes:

| 	[The blurb that i am currently looking at says that it has an
| 	8k cache on board (6k instruction and 2k data).]

  Okay, someone enlighten me, why that mix of cache. I would assume that
cache would be better spent on data than code, since (most) code is
pipelined and except for jumps not hurt by cache misses. However, when a
data fetch results in a cache miss, the CPU is esentially blocked
(modulo any parallelism provided).

  I preseme that there's a good reason for this which I'm missing, but
I'm fairly sure of my assumptions, having done a bit of scope and
counter work with an 8086 with slow memory and seen that the jumps were
a lot less frequent than the data fetches. Of course on an uncached
system there were lots of blocks due to data write, but that was
identified, and writes are a lot less than reads, anyway.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

mash@mips.COM (John Mashey) (10/19/90)

In article <5791@munnari.oz.au> raob@mullian.ee.mu.OZ.AU (richard oxbrow) writes:
>
>
>	Does somebody have some detailed information about the 
>	64bit sparc chips that they might like to post.
>
>	More specifically the
>
>	Matsushita  MN10501 (?) 64 bit sparc cpu with fpu,mmu and cache.	
>
>	[The blurb that i am currently looking at says that it has an
>	8k cache on board (6k instruction and 2k data).]

NOTE: to avoid confusion, this is a 32-bit architecture with 64-bit
busses, i.e., like the i860.  As Hennessy suggests, calling these
64-bit architectures just confuses everybody, and is completely
inconsistent with past practice.

I saw a a SPECmark of 12 quoted for the workstation, 1.7 MFLOPS.
Can anybody confirm that, and for SPEC, give the 10 numbers?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

turner@sp1.csrd.uiuc.edu (Steve Turner) (10/24/90)

(Wm E Davidsen Jr) writes:
>
> (richard oxbrow) writes:
> 
> | 	[The blurb that i am currently looking at says that it has an
> | 	8k cache on board (6k instruction and 2k data).]
> 
>   Okay, someone enlighten me, why that mix of cache.

I'm not sure about the comment about how performance is not hurt by
I-cache misses as much as D-cache.  I see the point, but jumps that
stall the pipeline are still a big performance loss.  In any event I
think this is highly debatable for the general case.

I'll stick my neck out and speculate as to a possible rationale for
the sizes.  This is a completely intuitive argument, based on several
years wasted, er... spent, talking and reading about this stuff.

I-cache for typical ISA/load combinations can get very good hit rates
(95%+) with a 512 word (2k) cache.  This size typically captures most
loops, but not whole threads.  To get a serious increase in
performance beyond this, the I-cache must be increased to capture a
significant piece of multiple loops, or most of a thread.  In other
words, past a certain point, you don't get much "bang-for-the-buck"
out of increasing I-cache size.

D-cache on the other hand can see almost linear increase in hit rate
with increasing size (you can almost feel the breeze from my waving
hands at this point) up to quite large sizes.

SO:  for small (i.e, on-chip) cache systems, you're better off
spending most of the real-estate on D- rather than I-cache.


I'm suprised no one else has jumped on this already.  Perhaps they
have, and the post has not made it to our machine yet...
--
    Steve Turner (on the Si prairie  - UIUC CSRD)

    ARPANET:  turner@csrd.uiuc.edu
    Phone:    (217) 244-7293 or (217) 367-0882

    I went walking in the wasted city / Started thinking about entropy
    Smelled the wind from the ruined river / Went home to watch TV
                                                 -- Warren Zevon

hp@vmars.tuwien.ac.at (Peter Holzer) (10/25/90)

turner@sp1.csrd.uiuc.edu (Steve Turner) writes:

>> (richard oxbrow) writes:
>> 
>> | 	[The blurb that i am currently looking at says that it has an
>> | 	8k cache on board (6k instruction and 2k data).]

>I'll stick my neck out and speculate as to a possible rationale for
>the sizes. 

I assume you mean the sizes above.

[stuff deleted]

>In other
>words, past a certain point, you don't get much "bang-for-the-buck"
>out of increasing I-cache size.

>D-cache on the other hand can see almost linear increase in hit rate
>with increasing size (you can almost feel the breeze from my waving
>hands at this point) up to quite large sizes.

>SO:  for small (i.e, on-chip) cache systems, you're better off
>spending most of the real-estate on D- rather than I-cache.

But on the chip in question the I-cache is larger than the D-cache.
You just explained why it should be the other way.

--
|    _  | Peter J. Holzer                       | Think of it   |
| |_|_) | Technical University Vienna           | as evolution  |
| | |   | Dept. for Real-Time Systems           | in action!    |
| __/   | hp@vmars.tuwien.ac.at                 |     Tony Rand |

rtrauben@cortex.Eng.Sun.COM (Richard Trauben) (10/25/90)

In article <TURNER.90Oct23173718@sp1.csrd.uiuc.edu> (Steve Turner) writes:
>> | 	[The blurb that i am currently looking at says that it has an
>> | 	8k cache on board (6k instruction and 2k data).]
>> 
>>   Okay, someone enlighten me, why that mix of cache.
>
>I'm not sure about the comment about how performance is not hurt by
>I-cache misses as much as D-cache.  I see the point, but jumps that
>stall the pipeline are still a big performance loss.  In any event I
>think this is highly debatable for the general case.
>
>I'll stick my neck out and speculate ...  I-cache ..  captures most
>loops, but not whole threads. ...

.... about the relative sizes of the I$ to D$ on the Solbourne/Matsushita SPARC
processor: i dont work for solbourne/matsushita (or play one on TV) but.. 
i have another theory:
        yes. it is true that d$ miss rates are much higher than i$ miss 
rates for split caches of uniform size. 
        there are a whole lot of "typical" applications with the following
statistics: 1 data memory reference for every 4 instructions. 
        This means that data memory references only contribute 20 of the
memory bandwidth bottleneck and are therefore instruction memory references
are 4 times more expensive than a data memory access which misses. 
        Equalizing the miss rate for i$ and d$ would therefore contribute 
to an unbalanced machine.  Your mileage may vary. 

-Richard Trauben
Sun Microsystems

turner@sp1.csrd.uiuc.edu (Steve Turner) (10/26/90)

In article <1923@tuvie> hp@vmars.tuwien.ac.at (Peter Holzer) writes:
> 
> turner@sp1.csrd.uiuc.edu (Steve Turner) writes:
			...  Was this really me??  Damn.

> [stuff deleted]
> 
>>SO:  for small (i.e, on-chip) cache systems, you're better off
>>spending most of the real-estate on D- rather than I-cache.
> 
> But on the chip in question the I-cache is larger than the D-cache.
> You just explained why it should be the other way.

Right.  A typo at the end (I _meant_ "I- rather than D-", of course),
and I blow my whole argument.  Sheesh...

--
    Steve Turner (on the Si prairie  - UIUC CSRD)

    ARPANET:  turner@csrd.uiuc.edu
    Phone:    (217) 244-7293 or (217) 367-0882

    I went walking in the wasted city / Started thinking about entropy
    Smelled the wind from the ruined river / Went home to watch TV
                                                 -- Warren Zevon