[comp.unix.i386] Cache performance on 386 boards running Unix

angel@umigw.MIAMI.EDU (angel li) (10/24/89)

Does anyone know the performance difference of 386 boards with a cache
against boards without a cache, both running Unix?  I would like to
know whether paying for the cache is worth the money.
-- 
Angel Li
University of Miami/RSMAS

Internet: angel@flipper.miami.edu			UUCP: ncar!umigw!angel

pb@idca.tds.PHILIPS.nl (Peter Brouwer) (10/25/89)

In article <919@umigw.MIAMI.EDU> angel@flipper.miami.edu (angel li) writes:
>Does anyone know the performance difference of 386 boards with a cache
>against boards without a cache, both running Unix?  I would like to
>know whether paying for the cache is worth the money.
This depends on the size/working set of your applications you use.
Most caches are 64k = 16 pages. So if you have large applications with
a working set ( number of pages it used during execution ) the cache is'nt
a great help. 

-- 
Peter Brouwer,                # Philips Telecommunications and Data Systems,
NET  : pb@idca.tds.philips.nl # Department SSP-P9000 Building V2,
UUCP : ....!mcvax!philapd!pb  # P.O.Box 245, 7300AE Apeldoorn, The Netherlands.
PHONE:ext [+31] [0]55 432523, # Never underestimate the power of human stupidity

akcs.larry@nstar.UUCP (Larry Snyder) (10/25/89)

>Does anyone know the performance difference of 386 boards with a cache
>against boards without a cache, both running Unix?  I would like to
>know whether paying for the cache is worth the money.

I know that boards with a cache in several installations have had problems

dhinds@portia.Stanford.EDU (David Hinds) (10/26/89)

In article <416@ssp2.idca.tds.philips.nl>, pb@idca.tds.PHILIPS.nl (Peter Brouwer) writes:
> This depends on the size/working set of your applications you use.
> Most caches are 64k = 16 pages. So if you have large applications with
> a working set ( number of pages it used during execution ) the cache is'nt
> a great help. 
> 
This is not really true.  A RAM cache constantly turns over to reflect
the current memory usage of whatever is running on a system.  It stores
recently used instructions and data.  It is most effective at speeding
up fairly small loops (a few K), or code which accesses the same data
repeatedly.  Fortunately, this represents the local activity of almost
all programs, almost all of the time.  I don't have any concrete data,
but I've seen quotes for 80386 systems with 64K caches of hit rates of
90-95% for typical real applications.  Context switching should not
degrade performance much, because the time scale over which the cache
works is much, much shorter than a task's time slice.  The cache turns
over fast enough that it recovers almost immediately from a context
switch.  RAM caches have been standard equipment on mainframes for
decades.  In fact, 64K is quite large as caches go; the 80486 chip has
a cache of something like 256 bytes, but I wouldn't be surprised if
this had hit rates of 80-90%, even with very large programs.
                             -David Hinds
                              dhinds@portia.stanford.edu

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/26/89)

In article <416@ssp2.idca.tds.philips.nl>, pb@idca.tds.PHILIPS.nl (Peter Brouwer) writes:
|  This depends on the size/working set of your applications you use.
|  Most caches are 64k = 16 pages. So if you have large applications with
|  a working set ( number of pages it used during execution ) the cache is'nt
|  a great help. 

  Micro caches don't work in 4k pages, so what has this to do with
anything? I suspect you're thinking of mainframe cache which may work
in larger chunks. 4, 16, and 32 byte cache chunks are mentioned by
manufacturers, I think someone used 64 bytes, but I haven't got the
name.

  If anyone has an Intel cache controller spec sheet handy, please
contribute. 
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

dhinds@portia.Stanford.EDU (David Hinds) (10/26/89)

In article <1480@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>   If anyone has an Intel cache controller spec sheet handy, please
> contribute. 

I don't have the spec's, but as I understand it, the Intel chip is a
"2-way set-associative" cache controller.  Each entry in the cache
has two parts: a physical-memory address to which it corresponds, and
a copy of the data at that address (32 bits, I think).  To speed up
the check to see if something is in the cache, there are only two
entries in the cache where any given physical address might reside.
A 'set' of physical addresses are 'associated' with each pair of
cache entries.  When the 80386 sends out a physical address, the
cache controller figures out which two entries might contain the data
(probably using the top bits of the address as an index into the cache),
checks those two entries to see if either address matches the request,
and either returns the data from the cache or polls main memory.
A 64K cache is built from 8 8K-by-8bit static RAM chips, to be 64 bits
wide, so that each cache entry has room for a 32-bit address and 32 bits
of data.  So it actually has 32K of data.
                               -David Hinds
                                dhinds@portia.stanford.edu

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/26/89)

In article <6101@portia.Stanford.EDU>, dhinds@portia.Stanford.EDU (David Hinds) writes:
|            In fact, 64K is quite large as caches go; the 80486 chip has
|  a cache of something like 256 bytes, but I wouldn't be surprised if
|  this had hit rates of 80-90%, even with very large programs.

  The 80486 that Intel sells has 8k.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

rcd@ico.isc.com (Dick Dunn) (10/27/89)

pb@idca.tds.PHILIPS.nl (Peter Brouwer) writes:
> ... angel@flipper.miami.edu (angel li) writes:
> >Does anyone know the performance difference of 386 boards with a cache
> >against boards without a cache, both running Unix?...
> This depends on the size/working set of your applications you use.

It depends on the working set--data and code both, although separate I and
D caches help a lot even with large amounts of data because the code is
still likely to have good locality even if the data defeats the caching.

> Most caches are 64k = 16 pages. So if you have large applications with
> a working set ( number of pages it used during execution ) the cache is'nt
> a great help. 

No.  (I've never seen a page-oriented cache--refill would be nasty!)  The
normal organization caches a small amount of memory (~ 4 - 8 bytes) per
cell, so the relevant question is not at page granularity.  It is rare for
a computation-intensive program to have a large amount of active code at any
given time--in fact, somewhat the opposite, because "computation intensive"
often means a few small loops.

Also, remember that the 386 presents physical addresses, so cache flushes
don't have to happen very often.  (Don't confuse a cache flush with a TLB
flush.)

Some informal experiments we've done suggest that a decent cache does a lot.
For example, a cached 25-MHz machine is easily twice as fast as an uncached
16-MHz even though the processor is only about 50% faster.

Keep in mind that this is CPU speed.  Look at your processing mix; if
you're I/O bound, there are still secondary reasons that a cache can help
but it's not such a big deal.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...No DOS.  UNIX.

dave@mobile.UUCP (David C. Rein) (10/27/89)

In article <919@umigw.MIAMI.EDU>, angel@umigw.MIAMI.EDU (angel li) writes:
> Does anyone know the performance difference of 386 boards with a cache
> against boards without a cache, both running Unix?  I would like to
> [stuff deleted]
> Angel Li

Well, I have no performance specs on hand, but having the 386 get its
stuff from 35 ns static RAM instead of 60-80 ns dynamic RAM tells you 
something right there.  Also, on the IBM PS/2 Model 70 25Mhz machines, 
they found it important enough to give the top 128k to BIOS, so that 
it could be cached, instead of accessing ROM.

                                                               Dave Rein
UUCP: ..!kodak!gizzmo!lazlo!\      \/   "It just goes to show what you can do 
            mobile!dave            /\         if you're a total psychotic"
Domain: dcr0801@ultb.isc.rit.edu  /  \           -- Woody Allen

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/28/89)

In article <1989Oct27.031800.4938@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes:

|  Some informal experiments we've done suggest that a decent cache does a lot.
|  For example, a cached 25-MHz machine is easily twice as fast as an uncached
|  16-MHz even though the processor is only about 50% faster.

  It really does depend on the machine, not just the speed. For instance
a machine with 2 and 4 way interleave would benefit less from a cache
than one with no interleave, and one with wait states benefits more than
one without. 16MHz is the point at which it is still possible to do 0w/s
with more or less standard memory parts.

  I have done some measurements on normal, interleaved, and 16 bit
memory, and conclude that cache is a huge win as your memory gets
slower, and that 64k will mask the effects of slow memory for many
applications.

  note: I'm note disagreeing, just adding some clarifying information. I
would not buy a 25/33 MHz machine w/o cache, because it adds so little
to the price of the machine as a whole ($200-300).
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

rick@pcrat.uucp (Rick Richardson) (10/30/89)

>In article <919@umigw.MIAMI.EDU>, angel@umigw.MIAMI.EDU (angel li) writes:
> Does anyone know the performance difference of 386 boards with a cache
> against boards without a cache, both running Unix?  I would like to

Back in the bad old days of $30+ DRAMS, we bought a 16Mhz Mylex motherboard.
This mother has a 64K cache and 120ns main memory.  Running the Dhrystones
(under 386/ix 1.0.4) on it gave 4950 'stones with the cache turned on (~0ws),
and 3652 'stones with the cache turned off (1ws).

Your results may vary...

-- 
Rick Richardson |       Looking for FAX software for UNIX/386 ??????     mention
PC Research,Inc.|                  WE'RE SHIPPING			 your
uunet!pcrat!rick|    Ask about FaxiX - UNIX Facsimile System (tm)        FAX #
(201) 389-8963  | Or JetRoff - troff postprocessor for the HP {Laser,Desk}Jet

pb@idca.tds.PHILIPS.nl (Peter Brouwer) (10/30/89)

In article <1480@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes:
>In article <416@ssp2.idca.tds.philips.nl>, pb@idca.tds.PHILIPS.nl (Peter Brouwer) writes:
>|  This depends on the size/working set of your applications you use.
>|  Most caches are 64k = 16 pages. So if you have large applications with
>|  a working set ( number of pages it used during execution ) the cache is'nt
>|  a great help. 
>
>  Micro caches don't work in 4k pages, so what has this to do with
>anything? I suspect you're thinking of mainframe cache which may work
>in larger chunks. 4, 16, and 32 byte cache chunks are mentioned by
>manufacturers, I think someone used 64 bytes, but I haven't got the
>name.
>
I must admit I did not know this. From the discussions uptill now I understand
the cache contains a 32 bits address and 32 bit data.
Does this means that it can contain 1K memory references.
If this is the case and you have an application which uses for instance an
memory array > 1K and does lots of accesses in it , will result in a low
cache hit ratio.
Is a correct assumtion?







-- 
Peter Brouwer,                # Philips Telecommunications and Data Systems,
NET  : pb@idca.tds.philips.nl # Department SSP-P9000 Building V2,
UUCP : ....!mcvax!philapd!pb  # P.O.Box 245, 7300AE Apeldoorn, The Netherlands.
PHONE:ext [+31] [0]55 432523, # Never underestimate the power of human stupidity