[comp.sys.intel] Zero wait state and caches

root@nat-3.UUCP (nat-3 System Administrator) (10/05/90)

Hello --

	I have a 25 MHz 386 motherboard with no cache, but with
plenty of 70 ns DRAM that provides zero wait state performance.
Is there any reason that a cache would boost performance on my
machine?  My (very limited, probably incorrect, software-oriented)
reasoning is NO:

	-	My 386 places a memory request on the bus in one
		CPU cycle.

	-	Since I have zero wait state memory, my DRAMS
		will satisfy the request in the next CPU cycle.

	-	You can't do better than that, so ...

	-	A cache will not boost performance.

Would someone please tell me where I am right and where I am wrong?

Please respond by e-mail since I do not subscribe to this newsgroup.

				Thank you very much,

					John
-- 
John R. Meyer			Path:   ...!uunet!nat-3!root
10208-C Ashbrooke Ct.		Domain: root@nat-3.UUCP
Oakton, VA  22124		Phone:  (703) 281-5157 (H)
USA					(703) 802-1872 (O)

wsinpdb@svin02.info.win.tue.nl (Paul de Bra) (10/08/90)

In article <188@nat-3.UUCP> root@nat-3.UUCP (nat-3 System Administrator) writes:
>Hello --
>
>	I have a 25 MHz 386 motherboard with no cache, but with
>plenty of 70 ns DRAM that provides zero wait state performance.
>Is there any reason that a cache would boost performance on my
>machine?  My (very limited, probably incorrect, software-oriented)
>reasoning is NO:

At 25 Mhz, 70ns DRAM will not provide zero wait state performance for
2 reasons:
1) 70ns is too slow. you probably have some kind of interleaved memory
   which starts fetching a memory word as soon as the previous one is
   requested. You have approx 35ns for zero wait state, so if you start
   the memory read ahead of time you can get zero wait state, otherwise
   you don't. As most programs switch back and forth between fetching
   instructions and data, you will not experience zero wait state behaviour.
2) The DRAM needs to be refreshed, which may occasionally cause extra
   wait states if you try to access the part of memory that is being
   refreshed.

A cache system provides zero wait states on a cache hit, but one or two
wait states on a cache miss. Given sufficient cache memory (64k or more)
the hit rate is fairly close to 100%, and the performance increase is
substantial.

Paul.
(debra@research.att.com)

rick@wucs1.wustl.edu (Rick Bubenik) (10/09/90)

In article <188@nat-3.UUCP> root@nat-3.UUCP (nat-3 System Administrator) writes:
>
>Hello --
>
>	I have a 25 MHz 386 motherboard with no cache, but with
>plenty of 70 ns DRAM that provides zero wait state performance.
>Is there any reason that a cache would boost performance on my
>machine?  My (very limited, probably incorrect, software-oriented)
>reasoning is NO:

Your analysis is understandable, but incorrect.  It turns out that
when a computer is advertised as 0 wait state, what they really mean
is 0 wait state when pipelined memory modules are used.  Also, only
reads operate with 0 wait states, writes take 1 wait state.

Here's how it works:  Without pipelining, memory accesses take from 2
to N cycles.  In the first cycle, the CPU places the address on the
bus.  In the second cycle, the device either responds (if it is fast
enough) or it inserts a wait state.  This repeats until the device is
able to respond.  With pipelining, the CPU puts the address on the bus
in the Nth cycle of the previous cycle.  This gives the device an extra
cycle within which to repond.  However, writes take one more cycle than
reads for reasons that I don't quite understand (and not explained in
my 386 data book).

Even when using pipelining, not all memory accesses can be pipelined.
Your DRAM modules must be interleaved to achieve 0 wait state
performance.  If two back-to-back accesses to the same bank occur,
no pipelining can be done since the DRAMS require a precharge time
(you can't precharge a bank while that bank is being accessed).  Also,
the CPU only pipelines when back-to-back accesses are occurring.  If the
bus goes idle for any reason (such as to execute a "long" instruction),
no pipelining will be done.

For your 25Mhz system, the cycle time is 40ns so clearly the only way it
could achieve 0 wait state performance is by using pipelining.

Assuming the cache is static RAM and (approximately) 40ns or faster, it
will operate with true 0 wait state performance.  Also, SRAMS don't need
precharging or refresh, so this also speeds access.  Of course, caches
are only effective on cache hits so the cache needs to be large enough
to guarantee close to a 100% hit ratio to be most effective.

In spite of all that was just said, I don't think that a cache will improve
the performance of your system much.  Most applications do many more
reads than writes and most of the reads are probably going to be
pipelined (due, largely, to instruction prefetch).  Also, other factors,
such as disk transfer and access rates, have a large impact on many
applications.

	rick

	Rick Bubenik	 		rick@cs.wustl.edu
	Research Associate
	Department of Computer Science
	Washington University
	Campus Box 1045
	One Brookings Drive
	St. Louis, Missouri 63130-4899
	(314) 726-7530

brucee@runxtsa.runx.oz.au (Bruce Evans) (10/09/90)

In article <1466@svin02.info.win.tue.nl> wsinpdb@svin02.info.win.tue.nl (Paul de Bra) writes:
>A cache system provides zero wait states on a cache hit, but one or two
>wait states on a cache miss. Given sufficient cache memory (64k or more)
>the hit rate is fairly close to 100%, and the performance increase is
>substantial.

For cheap 386 cache systems, claiming one or two wait states for cache misses
may be stretching the truth almost as much as claiming zero wait states. I
have a 33 MHz system that claims 2 wait states for a cache miss with a page
hit and 4 wait states for a cache miss without a page hit. A benchmark that
copies the n'th megabyte of physical memory to the m'th megabyte using
"rep movsd" gives times between 0.09 sec (12 cycles per dword = 8 cycles
overhead) and 0.24 sec (32 cycles per dword = 28 cycles overhead) for
different values of m and n in the range 2 to 6. The times are very
sensitive to m and n.

I suspect that the memory off the motherboard has more wait states (though
it is claimed to be 32-bit) and the cache is thrashing due to the non-random
pattern of accesses (reading extra and discarding it). With normal use, it
is hard to detect any difference between the speed of the different areas
of memory.
-- 
Bruce Evans  (evans@syd.dit.csiro.au)