[comp.arch] Half-Cache

daveb@geac.UUCP (11/13/87)
/* Written  7:49 am  Oct 26, 1987 by daveb@geac.UUCP in uicsrd:comp.arch */
||  Well, ICL (in Britain) was working with stack-top caches on the
|| large 2900's some years ago (10!).  If anyone from Blighty would
|| care to comment on what happened subsequently, I'll refrain from
|| blurting out what was then the hot technique in the internal
|| rumor-mill.  (Can you say "half-cache"?)

/* Written  Nov  2 22:55:00 1987, by fu@uicsrd.csrd.uiuc.edu, in <43700030@uicsrd>:-)) */
| I assume you are taking about the ICL 2980, which was announced 
| over 10 years ago as the top-of-the-line model. I didn't work on this
| particular machine but on the follow on, but I seem to remember that
| the cache or slave store (in ICL-ese) only held stack elements due to
| it's small size. As the ICL 2900 architecture was stack based, this
| seemed a reasonable trade off. There was no specific architectural
| support for this slave store while, from what I can recall of the AT&T
| stack cache, its operation is supported in the instruction set. 
| 
| In the follow up machine we had slightly larger chips to play with
| (1k ecl rams!) and decided to cache all primary operands. Fortunately
| the machine was canceled due to lack of small gyms necessary to house
| the systems had we actually finished it (over 200 pcbs with over 100
| packages per board). The last I heard, ICL no longer builds large
| systems, but buys them ready to sell from the Japanese. 
| 
| John Fu
| 
| University of Illinois,
| Center for Supercomputer Research and Development.
| ARPANET: fu%uicsrd@a.cs.uiuc.edu
| 	 or sometimes fu%tallis@decwrl.dec.com
| 

  Well, now that I'm reasonably sure that I'm not letting any cats
out of the bag ((:-)), I'll comment on the interaction of the ICL
half-cache and the actual architecture.  (I'll describe why its only
half a cache next article).

  The big ICL's were designed with the problems of efficiently
executing compiled HLLs in mind.  One of their requirements was the
efficient evaluation of expressions, and this led to a design to
provide direct hardware support of some stereotyped access
patterns... To quote Buckle:

   Any high-level-language programmer knows that if he uses the
   scalar a, there is a high probability that he will use it again
   soon, whereas if he uses the array element b[i] there is a much
   lower probability of re-use, but a high probability of access
   soon.  However a general-purpose cache-store designer has no
   real way of distinguishing between and exploiting these uses.
	J. K. Buckle, "The History of the ICL 2900", 
	London (McMillan Ltd), 1977.

  In the first release, this support was mostly absent, save in the
limited sense that the cache (a "normal" one) held only stack elements.
This tended to give quite a good hit-rate, since the stack was
stereotypically used for locals and procedure-return information,
very much in the C/Algol/Pascal tradition.

  The intended support was to have a small, fast store as a
"stack-top cache", a similar store as a "execution cache" and a real
cache as an "extra name cache".  Please note that what was intended
was not a content-addressable store ("real" cache) for the stack-top
and execution caches, but just fast memory.

  The architecture was organized so that locals, temporaries and
parameters were addressed via the "stack front" or "local name base"
registers (ICLese for the stack pointer and stack-frame pointer).
Other variables were addressed via the "extra name base" register.
The machine also used a subtype of their normal descriptors to point
to arrays, so that array base-addresses were "simple variables".
  Voila, references to the current stack frame, containing simple
variables and array- and struct-base descriptors were via the "stack
cache".  References to global simple variables and array-bases were
via the extra name base, and therefore via a content-addressable
cache.  Indirections through descriptors were typically not cached,
but could be put in the "extra name cache" if desired, or in a
"descriptor target cache" is one was designed (it was considered).
And of course, code in the current function was accessed via the
"execution cache".

  Result: references to code and the stack frame was through fast
(and relatively cheap) memory. References to globals were through
fast but expensive real cache. References to array elements were
via cached descriptors, but the elements were normally in main store.


  Interesting, wot?

  --dave (further revelations to follow re multiprocessing) c-b
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.