[comp.arch] HP PA-RISC Cache Question

lupin@jetsun.weitek.COM (Edward Lupin) (06/14/91)

I am having a hard time understanding how the HP PA "Snakes" cache can
operate at 66 MHz using only asynchronous SRAMs external to the CPU.
Referring to the timing diagram below, phi1 and phi2 are the two-phase
66 MHz clocks.  Latch clock is 90 degrees out of phase with phi1,
easily achievable with the 132 MHz input clock.

Assuming that latch clock to address delay is no less than 5 ns, and
data set up is no less than 2 ns, only 8 ns remains for the SRAM access
time, less, if there is any attempt to provide some margin.  Eight ns
access is much faster than any CMOS "off-the-shelf" SRAMs with which I
am familiar.

I suppose the address could be output on the positive edge of phi1,
yielding an extra 7 ns for SRAM access time, but then the address would
change for the next cycle before the data is latched into the CPU.  How
could HP guarantee the SRAM's minimum address to data invalid delay?

Have I made a mistake somewhere?  How does HP do it?


      __________            __________            __________
phi2            \__________/          \__________/          \__________
                 __________            __________            __________
phi1  __________/          \__________/          \__________/

      ____            __________            __________            _____
latch     \__________/          \__________/          \__________/
clock
      _   ___________________   ___________________   _________________
Adr   _XXX___________________XXX___________________XXX_________________
      _____________   ___________________   ___________________   _____
Data  _____________XXX___________________XXX___________________XXX_____


Ed
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
"All right, Doctor, you built this thing!  How do you propose to turn it
off?"                                                   lupin@weitek.com
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

maf@hpfcso.FC.HP.COM (Mark Forsyth) (06/14/91)

>From: lupin@jetsun.weitek.COM (Edward Lupin)
>
>I am having a hard time understanding how the HP PA "Snakes" cache can
>operate at 66 MHz using only asynchronous SRAMs external to the CPU.
>Referring to the timing diagram below, phi1 and phi2 are the two-phase
>66 MHz clocks.  Latch clock is 90 degrees out of phase with phi1,
                                ^^^^^^^^^^^^^^^^^^^^^^^
Not correct. 

>easily achievable with the 132 MHz input clock.
>
>Assuming that latch clock to address delay is no less than 5 ns, and
                                               ^^^^^^^^^^^^^^^^^^
Not correct.

>data set up is no less than 2 ns, only 8 ns remains for the SRAM access
                ^^^^^^^^^^^^^^^^^^      ^^^^
Not correct. Not Correct.

>time, less, if there is any attempt to provide some margin.  Eight ns
>access is much faster than any CMOS "off-the-shelf" SRAMs with which I
>am familiar.

The design uses 12ns standard asynchronous SRAMs for 66 MHz, as described
in published technical papers. Lower frequency operation can use slower
SRAM devices. And yes, the caches are accessed once every clock cycle.
BTW, there are announced 8ns devices in 64kbit density and 10ns devices
in 256kbit density.  

>I suppose the address could be output on the positive edge of phi1,
>yielding an extra 7 ns for SRAM access time, but then the address would
>change for the next cycle before the data is latched into the CPU.  How
>could HP guarantee the SRAM's minimum address to data invalid delay?
>
>Have I made a mistake somewhere?  How does HP do it?
>
I don't think that anyone at HP is willing to describe any unpublished
technical details, especially in an area considered to be a competitive
advantage. Unfortunately in this respect, the RISC processor market is
still a competitive business.  The only thing that can be said is that no 
laws of physics are being violated, that there is plenty of timing margin
to insure reliability and manufacturability, and that the viability of
large primary caches (off-chip) for high-performance RISC was generally
written off prematurely by some of the industry (or at least according to
technical articles published in the last couple of years).  

- Mark Forsyth (maf@hpesmaf.fc.hp.com)
disclaimer: my opinions, not HP's

crisp@mips.com (Richard Crisp) (06/15/91)

In article <8840036@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
>>From: lupin@jetsun.weitek.COM (Edward Lupin)
>>
>>
>I don't think that anyone at HP is willing to describe any unpublished
>technical details, especially in an area considered to be a competitive
>advantage. Unfortunately in this respect, the RISC processor market is
>still a competitive business.  The only thing that can be said is that no 
>laws of physics are being violated, that there is plenty of timing margin
>to insure reliability and manufacturability, and that the viability of
>large primary caches (off-chip) for high-performance RISC was generally
>written off prematurely by some of the industry (or at least according to
>technical articles published in the last couple of years).  
>

What Mark doesn't want to say is that HP figured out that they could calculate
an address in half of a pipeline stage. They simply allow something like
1 and 1/2 pipe stages for a cache access. 
-- 
		    Richard Crisp              crisp@mips.com
		MIPS Computer Systems        !decwrl!mips!crisp
		 928 Arques MS 5-07            (408) 524-7250
		 Sunnyvale, Ca 94086                           

ram@shukra.Eng.Sun.COM (Renu Raman) (06/18/91)

In article <4755@spim.mips.COM> crisp@mips.com (Richard Crisp) writes:
>>large primary caches (off-chip) for high-performance RISC was generally
>>written off prematurely by some of the industry (or at least according to
>>technical articles published in the last couple of years).  
>>
>
>What Mark doesn't want to say is that HP figured out that they could calculate
>an address in half of a pipeline stage. They simply allow something like
>1 and 1/2 pipe stages for a cache access. 
>-- 
>		    Richard Crisp              crisp@mips.com
>		MIPS Computer Systems        !decwrl!mips!crisp

A similar scheme (although the implementation is proabably different) could
be found if you plan to use the BIT ECL SPARC processors. One could design
large caches with ~10ns access time as the processor cycles at 80MhZ.  

There was a paper published in IEEE MICRO and other places about it.

I don't have the information handy.

renu raman
--
--------------------------------
   Renukanthan Raman				ARPA:ram@sun.com
   M/S 16-11, 2500 Garcia Avenue,               TEL :415-336-1813
   Sun Microsystems, Mt. View,  CA 94043

maf@hpfcso.FC.HP.COM (Mark Forsyth) (06/18/91)

>From: ram@shukra.Eng.Sun.COM (Renu Raman)
>be found if you plan to use the BIT ECL SPARC processors. One could design
>large caches with ~10ns access time as the processor cycles at 80MhZ.  
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is very impressive. A cycle time overhead of 2.5ns above the SRAM 
access time is hard to beat ! Does this use conventional PGAs or multi-chip
module packaging ? Are the caches direct-mapped or set associative ? Are
the required SRAMs synchronous or asynchronous timing protocol ? Are the 
I/O levels TTL or ECL ?  There have been several published articles 
stating that static RAM speeds would not keep pace with processor speeds 
and that on-chip primary caches would be the only alternative for 
increasing frequency. This example seems to show that cycle times can be very
close to SRAM access times. With 8ns SRAMs (TTL I/O) available and 6ns 
devices undoubtedly just around the corner, and emergence of advanced 
packaging technologies it seems that discrete SRAMs will remain viable 
for primary (one cycle access) cache memories for a few more generations.   
Of course, on-chip caches can have other advantages besides cycle time. 
It would be interesting to see a discussion of performance tradeoffs between
on-chip caches, two-level (1 on-chip, 1 off-chip), and large external
primary caches.


>   Renukanthan Raman				ARPA:ram@sun.com