pcg@cs.aber.ac.uk (Piercarlo Grandi) (01/20/91)
It seems that if something is posted here, this may well be considered prior art and make the subject matter unpatentable. I am in a daring mood these days, and I have decided to post another major revelation :->, contained in an unpublished paper of mine (please don't ask for it, enough detail is provided here), for which I should have sent back revisions years ago :-(. The "research" described herein is eight years old; as such, and in any case, it has nothing to do with the University College of Wales, its research activities and reputation, ...; I thank them for their generous permission to use their News connection, for which use I assume all responsibility anyhow. The subject is a reverse map MMU that does allows mapping of segments in different address spaces. The draft paper describing it has been submitted to a refereed journal years ago (revisions are pending since then), and was circulated without restrictions well before that. I will assume familiarity with the concept of reverse map MMUs. I will also use the Manchester terminology for MMU design, as they invented the whole subject (OOPS, sorry -- they sneakily copied it from IBM, the real inventors, and unfairly as to that, because they did it several years before IBM :->). My original design was for a 24 bit address CPU (the 68010); I will repeat the relevant numbers here, but they can be scaled. A virtual address is made up of an 8 bit address space number and a 24 bit byte offset into the address space; the 24 bits are structured as a 6 bit segment number, a 7 bit page number, and an 11 bit byte offset within the page. The high order half of each address space is shared by all of them. Physical memory is not larger than 2MB, so that a 10 bit physical page number will suffice (ah, how dated is this part of my old design! :->). To support high order memory sharing, we reinterpret the 8 bit address space humber as a region number, and the 6 bit segment number as a 5 bit segment in region number from here onwards; if the high order bit of the 6, the dropped bit, is one, the 8 bit region number is set to all ones. In other words, there are 256 regions, each made up of 32 segments; each address space is made up of two regions, one numbered from 0-254, and the 255th. A global segment number is the 13 bits concatenation of region and segment number. A reverse map MMU will have PARs and CPRs; each Page Address Register will be associated with a physical frame and contain the virtual address of the page contained in it; each Current Page Register will contain the physical address of a frequently used page. The PARs are indexed by physical address, the CPRs are a cache on the PARs, indexed by virtual address. When a CPR miss occurs all PARs are searched for the one that contains the right virtual address, and a CPR is rewritten with the contents of that PAR. To support segment sharing we need to add a CSR table; each Current Segment Register describes a segment that is in use. There will be 256 CSRs; each CSR will contain: 13 bits for the global segment number for this CPR 8 bits as the segment identifier 4 bits for access permissions (RWX, supervisor) The CPR table is indexed by using bit extraction on the 13 bit global segment number we want to translate. Once the correct entry has been selected, a check is made that a collision has not occurred. If no collision has occured, the 8 bit segment identifier is extracted and permissions are checked. The 8 bit segment identifier is concantenated to the 7 bit page in segment number, and it is hashed to get a 10 bit index in the 1024 entry CPR table. Hashing is performed by splitting the 15 bits in two parts and using each part to index a ROM of random numbers, which are XORed (technique used by Prof. Ian Watson in the matching unit of the Manchester Dataflow Machine), or any other suitable means, like bit extraction. There are 1024 CPR entries simply to minimize the chance of collisions, and even 1024 CPRs add up to just a few KBytes of fast memory. Each CPR will contain 13 bits global segment number 7 bits page number 10 bits real page number 2 bits for access and modification logging (access could be a clock sized field, for WS or PFF, but probably it is too costly...) The format of the PARs and of the PAR table is left to the OS; it might well be an hash table with a number of entries slightly larger than that of physical pages. CSR and CPR reload is done by the OS. How it proceeds is fairly obvious, except for the crucial trick that makes the CSR work, the use of he segment id. The problem is that if a segment is remapped, that is it changes its virtual address, it will in general have to be mapped by a different CSR entry, because the number of its CSR is computed from its virtual address. So the contents of the CSR corresponding to the old address shall be copied to the new CSR. This would seem to require moving also all the CPR entries for that segment, except that the CPR table is indexed with the segment id (the global segment number in them will have to be changed though; a possibility would be to use the segment identifier there as well instead, but there are reasons not to do so). If instead of overwriting the segment identifier field in the new CSR we *exchange* it with the one in the old CSR, even if a segment changes address its segment id will remain the same, thus avoiding the need to flush the relevant CPRs, and no two segments will ever have the same same segment id, thus avoiding a lot of problems. The segment iddentifier field of all CSRs will initially be set to the number of that CSR. PLEASE NOTE: I have omitted a lot of details, and possible alternatives, they are fairly obvious. (This is not a patent, quite the opposite! :->) This MMU supports very efficiently the case where segments are sent as messages between address spaces (MUSS). Full shared segments is made equivalent to pretending that the shared segment is sent back and forth as a message every time a reference to it is made from an address space other than the last one that used it. Suppose the same segment is mapped in regions 1,2,3 as segment 4,5,6 respectively. It was last referenced with global segment number (1,4); if region 3 is scheduled, and it references segment 6, this is the same as remapping global segment (1,4) as (3,6). Not a big deal. Caching, hinting, and lazy CSR and CPR remapping can all be used with good effect here. Also, having two tightly coupled, in the sense that they interleave very closely on the same shared segment, threads in two distinct address spaces seems to me not something that should be encouraged at all. Some have thought that it is better to use 32 bit unique segment ids, and do away with remapping entirely, as then multiple CSRs may contain the same segment id, but I think this is not entirely desirable; it requires extra hashing/compression, and in any case most sharing will occur thanks to the global shared region. Having 256 CSRs seems quite adequate for capturing the locality of a small system. Finally, sharing-by-remapping does not require a lot of address aliasing like sharing-by-multiple-mappings; the latter would require quite a bit of extra lists in the PARs, and extra difficulties. I tend to dislike the very idea of shared memory, and supporting it is one thing, making it too easy is another. If you want to know more about reverse map MMUs, there are a few interesting reports/thesises/articles from the Department of Computer Science, The Victoria University of Manchester, Oxford Road, Manchester, United Kingdom. There is also some material in IBM papers on their RISC architectures (probably they patented reverse map MMUs as well; the original Manchester MMU for the MU4 "Atlas" was itself reverse map, and that was the one copied sneakily a few years in advance from the IBM invention :->). Resource analysis: this architecture requires a few SRAM (and ROM) KBytes, and essentially two SRAM (and one ROM) delays in the address translation path. One SRAM delay could be obviated if the CSRs were abolished alongside with any segment sharing. I am currently quite a bit ill, and this may explain some higher than usual error rate in the above. Apologies in advance. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk