lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (02/20/88)
The TF-1 people at IBM intend to use an interesting trick to simplify their CPU. DRAMs can be purchased that have "page mode" - that is, you can access the next-address value much more quickly than a randomly addressed value. This is because each random access can leave a large number of bits in a long register (say, 1024 bits, in the case of a 1Mb RAM). A page-mode access just shifts the register. So, the TF-1 CPU chip will expect another 32 bits of instruction every 20ns. As long as the PC just upcounts, they claim that page-mode RAMs will be fast enough. When the CPU decides to branch, of course, there's trouble. They solve this by keeping a cache of the instruction streams at 32 recent branch targets. If the target PC hits, then they fetch instructions from the cached stream, until the RAMs have done their random access, and are ready to page-mode again. I haven't studied the recent RAM offerings well enough to count the cycles, and critique the speed expectations. I guess it sounds fine, and it does sound simple. But, there's a major catch: it's a Harvard architecture. The memory is code-only, so that grubby data won't spoil the code's pipelined perfection. I know that some recent RAM chips are dual-ported, supposedly so that a processor can write image data through the random port, while a graphics screen is being refreshed through the page-mode port. Would these chips allow the TF-1 trick to work in non-Harvard designs ? -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
oconnor@sunset.steinmetz (Dennis M. O'Connor) (02/21/88)
An article by lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) says: ] The TF-1 people at IBM intend to use an interesting trick to simplify ] their CPU. ] DRAMs can be purchased that have "page mode" - that is, you can access the ] next-address value much more quickly than a randomly addressed value. This ] is because each random access can leave a large number of bits in a long ] register (say, 1024 bits, in the case of a 1Mb RAM). A page-mode access just ] shifts the register. ] So, the TF-1 CPU chip will expect another 32 bits of instruction every 20ns. ] As long as the PC just upcounts, they claim that page-mode RAMs will be fast ] enough. ] When the CPU decides to branch, of course, there's trouble. They solve this ] by keeping a cache of the instruction streams at 32 recent branch targets. ] If the target PC hits, then they fetch instructions from the cached stream, ] until the RAMs have done their random access, and are ready to page-mode ] again. Well, it's may be interesting but it's not original. GE's own RPM40 already does this ( but better (IMHO) than you describe ), and I believe the AMD29000 gives you the CHOICE of doing something like this. That memory system is not going to be simple, by the way : branches are not your ownly problem. You need to handle crossing page boundaries in your RAM as well. But that's doable. As described, it's also not going to be Rad-Hard. Dynamic never is. ] I haven't studied the recent RAM offerings well enough to count the cycles, ] and critique the speed expectations. I guess it sounds fine, and it does ] sound simple. But, there's a major catch: it's a Harvard architecture. The ] memory is code-only, so that grubby data won't spoil the code's pipelined ] perfection. (Humor mode on) That's not a catch, that's a FEATURE! (HM off). Seriously folks, at 200MBytes/sec of JUST instruction fetch, you weren't thinking of sharing that nice, simple, unidirectional instruction bus with messy old bi-directional data, were you? ] I know that some recent RAM chips are dual-ported, supposedly so that a ] processor can write image data through the random port, while a graphics ] screen is being refreshed through the page-mode port. Would these chips ] allow the TF-1 trick to work in non-Harvard designs ? No. The "TF-1 trick" (which was the "RPM40 trick" and the "29000 trick" FIRST, BTW) needs a Harvard architecture, to provide sufficient bandwidth and, more importantly, to separate nice regular simple instruction-stream behavior from complex semi-random data access. ] -- ] Don lindsay@k.gp.cs.cmu.edu CMU Computer Science -- Dennis O'Connor oconnor@sunset.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "Nuclear War is NOT the worst thing people can do to this planet."
tim@amdcad.AMD.COM (Tim Olson) (02/21/88)
In article <910@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: | The TF-1 people at IBM intend to use an interesting trick to simplify their | CPU. | | DRAMs can be purchased that have "page mode" - that is, you can access the | next-address value much more quickly than a randomly addressed value. This | is because each random access can leave a large number of bits in a long | register (say, 1024 bits, in the case of a 1Mb RAM). A page-mode access just | shifts the register. This sounds more like Video-DRAM (VRAM) to me. VRAMS have a static-column shifter that can shift out the next sequential bit every cycle without any subsequent addresses, where as page-mode or static-column mode must be supplied a partial address for every access. | When the CPU decides to branch, of course, there's trouble. They solve this | by keeping a cache of the instruction streams at 32 recent branch targets. | If the target PC hits, then they fetch instructions from the cached stream, | until the RAMs have done their random access, and are ready to page-mode | again. Wow! Either there is serendipity involved here, or the TF-1 architects closely studied the Am29000 Manual -- this is the exact method we use to keep the pipeline fed during branches -- even the number of entries is the same! | I haven't studied the recent RAM offerings well enough to count the cycles, | and critique the speed expectations. I guess it sounds fine, and it does | sound simple. But, there's a major catch: it's a Harvard architecture. The | memory is code-only, so that grubby data won't spoil the code's pipelined | perfection. | | I know that some recent RAM chips are dual-ported, supposedly so that a | processor can write image data through the random port, while a graphics | screen is being refreshed through the page-mode port. Would these chips | allow the TF-1 trick to work in non-Harvard designs ? That's exactly what a VRAM does. It has effectively two ports: the random access port (for loads/stores and branch addresses), and the serial port (for sequential instruction fetches). This allows a Harvard-architecture machine to have separate buses for performance, while maintaining a shared instruction/data memory. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
bcase@apple.UUCP (Brian Case) (02/22/88)
In article <20482@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes: >In article <910@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: >| When the CPU decides to branch, of course, there's trouble. They solve this >| by keeping a cache of the instruction streams at 32 recent branch targets. >| If the target PC hits, then they fetch instructions from the cached stream, >| until the RAMs have done their random access, and are ready to page-mode >| again. > >Wow! Either there is serendipity involved here, or the TF-1 architects >closely studied the Am29000 Manual -- this is the exact method we use to >keep the pipeline fed during branches -- even the number of entries is >the same! And by the way, AMD has a patent pending on this (Phil Frieden's name leads the list; thanks Phil! (hope I spelled your last name right!)).