[comp.arch] Reverse map MMU with remappable segments

pcg@cs.aber.ac.uk (Piercarlo Grandi) (01/20/91)
It seems that if something is posted here, this may well be considered
prior art and make the subject matter unpatentable. I am in a daring
mood these days, and I have decided to post another major revelation
:->, contained in an unpublished paper of mine (please don't ask for it,
enough detail is provided here), for which I should have sent back
revisions years ago :-(.

The "research" described herein is eight years old; as such, and in any
case, it has nothing to do with the University College of Wales, its
research activities and reputation, ...; I thank them for their generous
permission to use their News connection, for which use I assume all
responsibility anyhow.

The subject is a reverse map MMU that does allows mapping of segments in
different address spaces. The draft paper describing it has been
submitted to a refereed journal years ago (revisions are pending since
then), and was circulated without restrictions well before that.

I will assume familiarity with the concept of reverse map MMUs. I will
also use the Manchester terminology for MMU design, as they invented the
whole subject (OOPS, sorry -- they sneakily copied it from IBM, the real
inventors, and unfairly as to that, because they did it several years
before IBM :->).

My original design was for a 24 bit address CPU (the 68010); I will
repeat the relevant numbers here, but they can be scaled.

A virtual address is made up of an 8 bit address space number and a 24
bit byte offset into the address space; the 24 bits are structured as a
6 bit segment number, a 7 bit page number, and an 11 bit byte offset
within the page. The high order half of each address space is shared by
all of them. Physical memory is not larger than 2MB, so that a 10 bit
physical page number will suffice (ah, how dated is this part of my old
design! :->).

To support high order memory sharing, we reinterpret the 8 bit address
space humber as a region number, and the 6 bit segment number as a 5 bit
segment in region number from here onwards; if the high order bit of the
6, the dropped bit, is one, the 8 bit region number is set to all ones.
In other words, there are 256 regions, each made up of 32 segments; each
address space is made up of two regions, one numbered from 0-254, and
the 255th. A global segment number is the 13 bits concatenation of
region and segment number.

A reverse map MMU will have PARs and CPRs; each Page Address Register
will be associated with a physical frame and contain the virtual address
of the page contained in it; each Current Page Register will contain the
physical address of a frequently used page. The PARs are indexed by
physical address, the CPRs are a cache on the PARs, indexed by virtual
address. When a CPR miss occurs all PARs are searched for the one that
contains the right virtual address, and a CPR is rewritten with the
contents of that PAR.

To support segment sharing we need to add a CSR table; each Current
Segment Register describes a segment that is in use.

There will be 256 CSRs; each CSR will contain:

    13	bits for the global segment number for this CPR

    8	bits as the segment identifier

    4	bits for access permissions (RWX, supervisor)

The CPR table is indexed by using bit extraction on the 13 bit global
segment number we want to translate. Once the correct entry has been
selected, a check is made that a collision has not occurred. If no
collision has occured, the 8 bit segment identifier is extracted and
permissions are checked.

The 8 bit segment identifier is concantenated to the 7 bit page in
segment number, and it is hashed to get a 10 bit index in the 1024 entry
CPR table. Hashing is performed by splitting the 15 bits in two parts
and using each part to index a ROM of random numbers, which are XORed
(technique used by Prof. Ian Watson in the matching unit of the
Manchester Dataflow Machine), or any other suitable means, like bit
extraction.

There are 1024 CPR entries simply to minimize the chance of collisions,
and even 1024 CPRs add up to just a few KBytes of fast memory.

Each CPR will contain

    13	bits global segment number

    7	bits page number

    10	bits real page number

    2	bits for access and modification logging (access could be
	a clock sized field, for WS or PFF, but probably it is too
	costly...)

The format of the PARs and of the PAR table is left to the OS; it might
well be an hash table with a number of entries slightly larger than that
of physical pages.

CSR and CPR reload is done by the OS. How it proceeds is fairly obvious,
except for the crucial trick that makes the CSR work, the use of he
segment id.

The problem is that if a segment is remapped, that is it changes its
virtual address, it will in general have to be mapped by a different CSR
entry, because the number of its CSR is computed from its virtual
address. So the contents of the CSR corresponding to the old address
shall be copied to the new CSR. This would seem to require moving also
all the CPR entries for that segment, except that the CPR table is
indexed with the segment id (the global segment number in them will have
to be changed though; a possibility would be to use the segment
identifier there as well instead, but there are reasons not to do so).

If instead of overwriting the segment identifier field in the new CSR we
*exchange* it with the one in the old CSR, even if a segment changes
address its segment id will remain the same, thus avoiding the need to
flush the relevant CPRs, and no two segments will ever have the same
same segment id, thus avoiding a lot of problems. The segment
iddentifier field of all CSRs will initially be set to the number of
that CSR.

PLEASE NOTE: I have omitted a lot of details, and possible alternatives,
they are fairly obvious.  (This is not a patent, quite the opposite! :->)

This MMU supports very efficiently the case where segments are sent as
messages between address spaces (MUSS). Full shared segments is made
equivalent to pretending that the shared segment is sent back and forth
as a message every time a reference to it is made from an address space
other than the last one that used it.  Suppose the same segment is
mapped in regions 1,2,3 as segment 4,5,6 respectively. It was last
referenced with global segment number (1,4); if region 3 is scheduled,
and it references segment 6, this is the same as remapping global
segment (1,4) as (3,6). Not a big deal. Caching, hinting, and lazy CSR
and CPR remapping can all be used with good effect here.

Also, having two tightly coupled, in the sense that they interleave very
closely on the same shared segment, threads in two distinct address
spaces seems to me not something that should be encouraged at all.

Some have thought that it is better to use 32 bit unique segment ids,
and do away with remapping entirely, as then multiple CSRs may contain
the same segment id, but I think this is not entirely desirable; it
requires extra hashing/compression, and in any case most sharing will
occur thanks to the global shared region. Having 256 CSRs seems quite
adequate for capturing the locality of a small system.

Finally, sharing-by-remapping does not require a lot of address aliasing
like sharing-by-multiple-mappings; the latter would require quite a bit
of extra lists in the PARs, and extra difficulties. I tend to dislike
the very idea of shared memory, and supporting it is one thing, making
it too easy is another.

If you want to know more about reverse map MMUs, there are a few
interesting reports/thesises/articles from the Department of Computer
Science, The Victoria University of Manchester, Oxford Road, Manchester,
United Kingdom. There is also some material in IBM papers on their RISC
architectures (probably they patented reverse map MMUs as well; the
original Manchester MMU for the MU4 "Atlas" was itself reverse map, and
that was the one copied sneakily a few years in advance from the IBM
invention :->).


Resource analysis: this architecture requires a few SRAM (and ROM)
KBytes, and essentially two SRAM (and one ROM) delays in the address
translation path. One SRAM delay could be obviated if the CSRs were
abolished alongside with any segment sharing.

I am currently quite a bit ill, and this may explain some higher than
usual error rate in the above. Apologies in advance.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk