daveb@geac.UUCP (11/13/87)
/* Written 7:49 am Oct 26, 1987 by daveb@geac.UUCP in uicsrd:comp.arch */ || Well, ICL (in Britain) was working with stack-top caches on the || large 2900's some years ago (10!). If anyone from Blighty would || care to comment on what happened subsequently, I'll refrain from || blurting out what was then the hot technique in the internal || rumor-mill. (Can you say "half-cache"?) /* Written Nov 2 22:55:00 1987, by fu@uicsrd.csrd.uiuc.edu, in <43700030@uicsrd>:-)) */ | I assume you are taking about the ICL 2980, which was announced | over 10 years ago as the top-of-the-line model. I didn't work on this | particular machine but on the follow on, but I seem to remember that | the cache or slave store (in ICL-ese) only held stack elements due to | it's small size. As the ICL 2900 architecture was stack based, this | seemed a reasonable trade off. There was no specific architectural | support for this slave store while, from what I can recall of the AT&T | stack cache, its operation is supported in the instruction set. | | In the follow up machine we had slightly larger chips to play with | (1k ecl rams!) and decided to cache all primary operands. Fortunately | the machine was canceled due to lack of small gyms necessary to house | the systems had we actually finished it (over 200 pcbs with over 100 | packages per board). The last I heard, ICL no longer builds large | systems, but buys them ready to sell from the Japanese. | | John Fu | | University of Illinois, | Center for Supercomputer Research and Development. | ARPANET: fu%uicsrd@a.cs.uiuc.edu | or sometimes fu%tallis@decwrl.dec.com | Well, now that I'm reasonably sure that I'm not letting any cats out of the bag ((:-)), I'll comment on the interaction of the ICL half-cache and the actual architecture. (I'll describe why its only half a cache next article). The big ICL's were designed with the problems of efficiently executing compiled HLLs in mind. One of their requirements was the efficient evaluation of expressions, and this led to a design to provide direct hardware support of some stereotyped access patterns... To quote Buckle: Any high-level-language programmer knows that if he uses the scalar a, there is a high probability that he will use it again soon, whereas if he uses the array element b[i] there is a much lower probability of re-use, but a high probability of access soon. However a general-purpose cache-store designer has no real way of distinguishing between and exploiting these uses. J. K. Buckle, "The History of the ICL 2900", London (McMillan Ltd), 1977. In the first release, this support was mostly absent, save in the limited sense that the cache (a "normal" one) held only stack elements. This tended to give quite a good hit-rate, since the stack was stereotypically used for locals and procedure-return information, very much in the C/Algol/Pascal tradition. The intended support was to have a small, fast store as a "stack-top cache", a similar store as a "execution cache" and a real cache as an "extra name cache". Please note that what was intended was not a content-addressable store ("real" cache) for the stack-top and execution caches, but just fast memory. The architecture was organized so that locals, temporaries and parameters were addressed via the "stack front" or "local name base" registers (ICLese for the stack pointer and stack-frame pointer). Other variables were addressed via the "extra name base" register. The machine also used a subtype of their normal descriptors to point to arrays, so that array base-addresses were "simple variables". Voila, references to the current stack frame, containing simple variables and array- and struct-base descriptors were via the "stack cache". References to global simple variables and array-bases were via the extra name base, and therefore via a content-addressable cache. Indirections through descriptors were typically not cached, but could be put in the "extra name cache" if desired, or in a "descriptor target cache" is one was designed (it was considered). And of course, code in the current function was accessed via the "execution cache". Result: references to code and the stack frame was through fast (and relatively cheap) memory. References to globals were through fast but expensive real cache. References to array elements were via cached descriptors, but the elements were normally in main store. Interesting, wot? --dave (further revelations to follow re multiprocessing) c-b -- David Collier-Brown. {mnetor|yetti|utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.