lupin@jetsun.weitek.COM (Edward Lupin) (06/14/91)
I am having a hard time understanding how the HP PA "Snakes" cache can operate at 66 MHz using only asynchronous SRAMs external to the CPU. Referring to the timing diagram below, phi1 and phi2 are the two-phase 66 MHz clocks. Latch clock is 90 degrees out of phase with phi1, easily achievable with the 132 MHz input clock. Assuming that latch clock to address delay is no less than 5 ns, and data set up is no less than 2 ns, only 8 ns remains for the SRAM access time, less, if there is any attempt to provide some margin. Eight ns access is much faster than any CMOS "off-the-shelf" SRAMs with which I am familiar. I suppose the address could be output on the positive edge of phi1, yielding an extra 7 ns for SRAM access time, but then the address would change for the next cycle before the data is latched into the CPU. How could HP guarantee the SRAM's minimum address to data invalid delay? Have I made a mistake somewhere? How does HP do it? __________ __________ __________ phi2 \__________/ \__________/ \__________ __________ __________ __________ phi1 __________/ \__________/ \__________/ ____ __________ __________ _____ latch \__________/ \__________/ \__________/ clock _ ___________________ ___________________ _________________ Adr _XXX___________________XXX___________________XXX_________________ _____________ ___________________ ___________________ _____ Data _____________XXX___________________XXX___________________XXX_____ Ed ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ "All right, Doctor, you built this thing! How do you propose to turn it off?" lupin@weitek.com ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
maf@hpfcso.FC.HP.COM (Mark Forsyth) (06/14/91)
>From: lupin@jetsun.weitek.COM (Edward Lupin) > >I am having a hard time understanding how the HP PA "Snakes" cache can >operate at 66 MHz using only asynchronous SRAMs external to the CPU. >Referring to the timing diagram below, phi1 and phi2 are the two-phase >66 MHz clocks. Latch clock is 90 degrees out of phase with phi1, ^^^^^^^^^^^^^^^^^^^^^^^ Not correct. >easily achievable with the 132 MHz input clock. > >Assuming that latch clock to address delay is no less than 5 ns, and ^^^^^^^^^^^^^^^^^^ Not correct. >data set up is no less than 2 ns, only 8 ns remains for the SRAM access ^^^^^^^^^^^^^^^^^^ ^^^^ Not correct. Not Correct. >time, less, if there is any attempt to provide some margin. Eight ns >access is much faster than any CMOS "off-the-shelf" SRAMs with which I >am familiar. The design uses 12ns standard asynchronous SRAMs for 66 MHz, as described in published technical papers. Lower frequency operation can use slower SRAM devices. And yes, the caches are accessed once every clock cycle. BTW, there are announced 8ns devices in 64kbit density and 10ns devices in 256kbit density. >I suppose the address could be output on the positive edge of phi1, >yielding an extra 7 ns for SRAM access time, but then the address would >change for the next cycle before the data is latched into the CPU. How >could HP guarantee the SRAM's minimum address to data invalid delay? > >Have I made a mistake somewhere? How does HP do it? > I don't think that anyone at HP is willing to describe any unpublished technical details, especially in an area considered to be a competitive advantage. Unfortunately in this respect, the RISC processor market is still a competitive business. The only thing that can be said is that no laws of physics are being violated, that there is plenty of timing margin to insure reliability and manufacturability, and that the viability of large primary caches (off-chip) for high-performance RISC was generally written off prematurely by some of the industry (or at least according to technical articles published in the last couple of years). - Mark Forsyth (maf@hpesmaf.fc.hp.com) disclaimer: my opinions, not HP's
crisp@mips.com (Richard Crisp) (06/15/91)
In article <8840036@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes: >>From: lupin@jetsun.weitek.COM (Edward Lupin) >> >> >I don't think that anyone at HP is willing to describe any unpublished >technical details, especially in an area considered to be a competitive >advantage. Unfortunately in this respect, the RISC processor market is >still a competitive business. The only thing that can be said is that no >laws of physics are being violated, that there is plenty of timing margin >to insure reliability and manufacturability, and that the viability of >large primary caches (off-chip) for high-performance RISC was generally >written off prematurely by some of the industry (or at least according to >technical articles published in the last couple of years). > What Mark doesn't want to say is that HP figured out that they could calculate an address in half of a pipeline stage. They simply allow something like 1 and 1/2 pipe stages for a cache access. -- Richard Crisp crisp@mips.com MIPS Computer Systems !decwrl!mips!crisp 928 Arques MS 5-07 (408) 524-7250 Sunnyvale, Ca 94086
ram@shukra.Eng.Sun.COM (Renu Raman) (06/18/91)
In article <4755@spim.mips.COM> crisp@mips.com (Richard Crisp) writes: >>large primary caches (off-chip) for high-performance RISC was generally >>written off prematurely by some of the industry (or at least according to >>technical articles published in the last couple of years). >> > >What Mark doesn't want to say is that HP figured out that they could calculate >an address in half of a pipeline stage. They simply allow something like >1 and 1/2 pipe stages for a cache access. >-- > Richard Crisp crisp@mips.com > MIPS Computer Systems !decwrl!mips!crisp A similar scheme (although the implementation is proabably different) could be found if you plan to use the BIT ECL SPARC processors. One could design large caches with ~10ns access time as the processor cycles at 80MhZ. There was a paper published in IEEE MICRO and other places about it. I don't have the information handy. renu raman -- -------------------------------- Renukanthan Raman ARPA:ram@sun.com M/S 16-11, 2500 Garcia Avenue, TEL :415-336-1813 Sun Microsystems, Mt. View, CA 94043
maf@hpfcso.FC.HP.COM (Mark Forsyth) (06/18/91)
>From: ram@shukra.Eng.Sun.COM (Renu Raman) >be found if you plan to use the BIT ECL SPARC processors. One could design >large caches with ~10ns access time as the processor cycles at 80MhZ. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is very impressive. A cycle time overhead of 2.5ns above the SRAM access time is hard to beat ! Does this use conventional PGAs or multi-chip module packaging ? Are the caches direct-mapped or set associative ? Are the required SRAMs synchronous or asynchronous timing protocol ? Are the I/O levels TTL or ECL ? There have been several published articles stating that static RAM speeds would not keep pace with processor speeds and that on-chip primary caches would be the only alternative for increasing frequency. This example seems to show that cycle times can be very close to SRAM access times. With 8ns SRAMs (TTL I/O) available and 6ns devices undoubtedly just around the corner, and emergence of advanced packaging technologies it seems that discrete SRAMs will remain viable for primary (one cycle access) cache memories for a few more generations. Of course, on-chip caches can have other advantages besides cycle time. It would be interesting to see a discussion of performance tradeoffs between on-chip caches, two-level (1 on-chip, 1 off-chip), and large external primary caches. > Renukanthan Raman ARPA:ram@sun.com