root@nat-3.UUCP (nat-3 System Administrator) (10/05/90)
Hello -- I have a 25 MHz 386 motherboard with no cache, but with plenty of 70 ns DRAM that provides zero wait state performance. Is there any reason that a cache would boost performance on my machine? My (very limited, probably incorrect, software-oriented) reasoning is NO: - My 386 places a memory request on the bus in one CPU cycle. - Since I have zero wait state memory, my DRAMS will satisfy the request in the next CPU cycle. - You can't do better than that, so ... - A cache will not boost performance. Would someone please tell me where I am right and where I am wrong? Please respond by e-mail since I do not subscribe to this newsgroup. Thank you very much, John -- John R. Meyer Path: ...!uunet!nat-3!root 10208-C Ashbrooke Ct. Domain: root@nat-3.UUCP Oakton, VA 22124 Phone: (703) 281-5157 (H) USA (703) 802-1872 (O)
wsinpdb@svin02.info.win.tue.nl (Paul de Bra) (10/08/90)
In article <188@nat-3.UUCP> root@nat-3.UUCP (nat-3 System Administrator) writes: >Hello -- > > I have a 25 MHz 386 motherboard with no cache, but with >plenty of 70 ns DRAM that provides zero wait state performance. >Is there any reason that a cache would boost performance on my >machine? My (very limited, probably incorrect, software-oriented) >reasoning is NO: At 25 Mhz, 70ns DRAM will not provide zero wait state performance for 2 reasons: 1) 70ns is too slow. you probably have some kind of interleaved memory which starts fetching a memory word as soon as the previous one is requested. You have approx 35ns for zero wait state, so if you start the memory read ahead of time you can get zero wait state, otherwise you don't. As most programs switch back and forth between fetching instructions and data, you will not experience zero wait state behaviour. 2) The DRAM needs to be refreshed, which may occasionally cause extra wait states if you try to access the part of memory that is being refreshed. A cache system provides zero wait states on a cache hit, but one or two wait states on a cache miss. Given sufficient cache memory (64k or more) the hit rate is fairly close to 100%, and the performance increase is substantial. Paul. (debra@research.att.com)
rick@wucs1.wustl.edu (Rick Bubenik) (10/09/90)
In article <188@nat-3.UUCP> root@nat-3.UUCP (nat-3 System Administrator) writes: > >Hello -- > > I have a 25 MHz 386 motherboard with no cache, but with >plenty of 70 ns DRAM that provides zero wait state performance. >Is there any reason that a cache would boost performance on my >machine? My (very limited, probably incorrect, software-oriented) >reasoning is NO: Your analysis is understandable, but incorrect. It turns out that when a computer is advertised as 0 wait state, what they really mean is 0 wait state when pipelined memory modules are used. Also, only reads operate with 0 wait states, writes take 1 wait state. Here's how it works: Without pipelining, memory accesses take from 2 to N cycles. In the first cycle, the CPU places the address on the bus. In the second cycle, the device either responds (if it is fast enough) or it inserts a wait state. This repeats until the device is able to respond. With pipelining, the CPU puts the address on the bus in the Nth cycle of the previous cycle. This gives the device an extra cycle within which to repond. However, writes take one more cycle than reads for reasons that I don't quite understand (and not explained in my 386 data book). Even when using pipelining, not all memory accesses can be pipelined. Your DRAM modules must be interleaved to achieve 0 wait state performance. If two back-to-back accesses to the same bank occur, no pipelining can be done since the DRAMS require a precharge time (you can't precharge a bank while that bank is being accessed). Also, the CPU only pipelines when back-to-back accesses are occurring. If the bus goes idle for any reason (such as to execute a "long" instruction), no pipelining will be done. For your 25Mhz system, the cycle time is 40ns so clearly the only way it could achieve 0 wait state performance is by using pipelining. Assuming the cache is static RAM and (approximately) 40ns or faster, it will operate with true 0 wait state performance. Also, SRAMS don't need precharging or refresh, so this also speeds access. Of course, caches are only effective on cache hits so the cache needs to be large enough to guarantee close to a 100% hit ratio to be most effective. In spite of all that was just said, I don't think that a cache will improve the performance of your system much. Most applications do many more reads than writes and most of the reads are probably going to be pipelined (due, largely, to instruction prefetch). Also, other factors, such as disk transfer and access rates, have a large impact on many applications. rick Rick Bubenik rick@cs.wustl.edu Research Associate Department of Computer Science Washington University Campus Box 1045 One Brookings Drive St. Louis, Missouri 63130-4899 (314) 726-7530
brucee@runxtsa.runx.oz.au (Bruce Evans) (10/09/90)
In article <1466@svin02.info.win.tue.nl> wsinpdb@svin02.info.win.tue.nl (Paul de Bra) writes: >A cache system provides zero wait states on a cache hit, but one or two >wait states on a cache miss. Given sufficient cache memory (64k or more) >the hit rate is fairly close to 100%, and the performance increase is >substantial. For cheap 386 cache systems, claiming one or two wait states for cache misses may be stretching the truth almost as much as claiming zero wait states. I have a 33 MHz system that claims 2 wait states for a cache miss with a page hit and 4 wait states for a cache miss without a page hit. A benchmark that copies the n'th megabyte of physical memory to the m'th megabyte using "rep movsd" gives times between 0.09 sec (12 cycles per dword = 8 cycles overhead) and 0.24 sec (32 cycles per dword = 28 cycles overhead) for different values of m and n in the range 2 to 6. The times are very sensitive to m and n. I suspect that the memory off the motherboard has more wait states (though it is claimed to be 32-bit) and the cache is thrashing due to the non-random pattern of accesses (reading extra and discarding it). With normal use, it is hard to detect any difference between the speed of the different areas of memory. -- Bruce Evans (evans@syd.dit.csiro.au)