[comp.arch] Parallel cache and TLB lookup

zs01+@andrew.cmu.edu (Zalman Stern) (04/07/90)

[Discussion of address aliasing problems on the IBM Risc System/6000]

In the Mach port to the RIOS, we get around this problem by almost
always running in virtual mode. The wired kernel memory (text segment
and unpageable data mapped at boot time) is mapped virtual=real. The
only time the machine goes into real mode is on interrupts. (System
calls and traps do not have to go into real mode on this machine.)
Memory accessed in an interrupt handler has to be wired anyway (and is
usually statically allocated as well). Besides, device drivers already
have to deal with some cache flushing since there is no hardware to
provide consistency between the cache and IO space.

Since the RIOS has an inverted page table, aliases between virtual
addresses requires taking page faults to move the correct virtual
address into the IPT. (That is is one alias is in the IPT, accessing
that memory through a different alias will take a page fault.) When this
happens, the fault handler flushes that page from the cache as well.

Sincerely,
Zalman Stern
Internet: zs01+@andrew.cmu.edu     Usenet: I'm soooo confused...
Information Technology Center, Carnegie Mellon, Pittsburgh, PA 15213-3890

JOSH@IBM.COM ("Josh Knight") (04/11/90)

In <1830@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes:
 > This brought to mind a question that has been niggling me for some
 > years: how is the trick worked when the software is *not* so
 > constrained? The IBM 308x and 3090 mainframes have (mostly) 64K caches
 > (per processor) which are 4-way set associative; and again only the
 > bottom 12 bits of the address are invariant under the virtual-to-real
 > mapping. However, the software is allowed to (and IBM operating systems
 > in fact do) reference a page of storage at different times by both
 > virtual and real addresses, whose low-order 14 bits will not, usually,
 > be equal.
 >
 > How is it done? I have never found an answer in the review articles in
 > the IBM journals (R&D, Systems). Is it, perhaps, a trade secret? In
 > earlier models, such as the IBM 3033, as the cache increased in size
 > so did the multiplicity of the associative lookup.
 >

The answer for the 3090 is in the article referenced in the appended refer
format citation, in this quote from page 10 of the cited article:

    An interesting complexity in cache design that has been given special
    treatment in the 3090 cache has to do with synonyms.  Virtual storage
    in System/370-XA architecture allows relocation of 4K-byte pages.  This
    means that the low-order 12 address bits that address a byte within a
    page are the same for both a virtual and a real address.  Architecture,
    however, allows different virtual addresses to map to the same real
    address. Thus the cache is managed by real addresses, despite the fact
    that it is accessed by virtual address.  Since it takes 16 bits to
    address a 64K-byte cache and there are only 12 real bits available, we
    lack four bits.  There are thus 16 places in the cache where an operand
    might reside.  Four of these locations are read out of the cache
    simultaneously on the initial cache read operation.  The directory,
    however, is built to read out all 16 entries simultaneously.  Thus, if
    there is a miss on all of the primary four locations but a hit on one of
    the other 12, the cache can be read correctly with a minimum delay.

%T The IBM 3090 System:  An Overview
%A S.G. Tucker
%J IBM Systems Journal
%V 25
%N 1
%P 4-19
%D 1986


Josh Knight
josh@ibm.com