jml@ivory.SanDiego.NCR.COM (Michael Lodman) (12/03/88)
According to Motorola, the 88200 CMMU does not cache the page and segment descriptors it fetches during a table walk. This would seem to me to have a negative impact on performance. Is this standard practise for table walks, and if not why did Motorola do it that way? John Mashey, perhaps you could tell me if MIPS does this. Michael Lodman (619) 485-3335 Advanced Development NCR Corporation E&M San Diego mike.lodman@ivory.SanDiego.NCR.COM {sdcsvax,cbatt,dcdwest,nosc.ARPA}!ncr-sd!ivory!jml When you die, if you've been very, very good, you'll go to ... Montana.
tom@nud.UUCP (Tom Armistead) (12/03/88)
In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes: >According to Motorola, the 88200 CMMU does not cache the page and >segment descriptors it fetches during a table walk. This would seem Wrong! The 88200 does cache page descriptors. Up to 56 page descriptors (each descriptor maps 4K of virtual space) can be cached in each CMMU. The page descriptor cache is managed by the CMMU. In addition, per CMMU, software can map up to eight 512K chunks of memory via the block address translation cache and avoid table walks on contiguous areas of memory (e.g. kernel text and data, i/o areas, etc.). --
mash@mips.COM (John Mashey) (12/03/88)
In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes: >According to Motorola, the 88200 CMMU does not cache the page and >segment descriptors it fetches during a table walk. This would seem >to me to have a negative impact on performance. Is this standard >practise for table walks, and if not why did Motorola do it that way? >John Mashey, perhaps you could tell me if MIPS does this. Yes. If it's in the cache, it's in the cache. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
tim@crackle.amd.com (Tim Olson) (12/03/88)
In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: | In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes: | >According to Motorola, the 88200 CMMU does not cache the page and | >segment descriptors it fetches during a table walk. This would seem | | Wrong! The 88200 does cache page descriptors. Up to 56 page descriptors | (each descriptor maps 4K of virtual space) can be cached in each CMMU. The | page descriptor cache is managed by the CMMU. Yes, the translation is cached in the TLB entries, but I think the question was do the memory accesses that are performed during the table walk also get cached in the data cache? Or must they always go to main memory? -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
paul@taniwha.UUCP (Paul Campbell) (12/05/88)
In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: >In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes: >>According to Motorola, the 88200 CMMU does not cache the page and >>segment descriptors it fetches during a table walk. This would seem > > Wrong! The 88200 does cache page descriptors. Up to 56 page descriptors >(each descriptor maps 4K of virtual space) can be cached in each CMMU. The I think what he is really asking is 'does the 88200 cache the segment table entries used to read page table entries with?'. I've often thought about whether this is worth while or not, it probably is for 'medium' to 'large' size programs ('small' ones only need a few entries and fit in the TLB cache, 'large' to 'huge' ones may thrash). I think that simulation is probably the only way to find out and of course your mileage may vary for different architectures and different job mixes. Of course systems that do software TLB misses like the 29K and the MIPS chips take advantage of any data cache the system has since they treat both segment table entry accesses and page table entry accesses like any other access. (This helps you particularly for VERY large programs that are thrashing in the TLB cache because the data cache speeds up TLB refills) Paul -- Paul Campbell ..!{unisoft|mtxinu}!taniwha!paul (415)420-8179 Taniwha Systems Design, Oakland CA "Read my lips .... no GNU taxes"
jml@ivory.SanDiego.NCR.COM (Michael Lodman) (12/06/88)
In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: >In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes: >>According to Motorola, the 88200 CMMU does not cache the page and >>segment descriptors it fetches during a table walk. > > Wrong! The 88200 does cache page descriptors. Mr. Armistead of Motorola, please re-read the question and try to answer it this time. I'm not asking you if the translated real addresses are maintained in the PATC, I'm asking if the data from the two fetches done during the table walk are placed in the d/i cache. If not, why not? Michael Lodman (619) 485-3335 Advanced Development NCR Corporation E&M San Diego mike.lodman@ivory.SanDiego.NCR.COM {sdcsvax,cbatt,dcdwest,nosc.ARPA}!ncr-sd!ivory!jml When you die, if you've been very, very good, you'll go to ... Montana.
dpm@k.gp.cs.cmu.edu (David Maynard) (12/07/88)
> I'm asking if the data from the two fetches done during the table walk are > placed in the d/i cache. If not, why not? I was just thinking about asking this question on a final exam in a computer architecture course. (Then I decided it was too hard.) I don't know why Motorola did one thing or another, but here are my 2 cents worth. The way I read the initial data sheet it doesn't look like the 88200 caches address translation tables if you access them during a table walk. However, if the PATC/BATC are big enough, you might just be polluting the cache if you loaded the table entries. Presumably, the only way a "normal" user program will access the tables is via a table walk. If you cache the results of the walk in the PATC then you supposedly won't need to access the table locations again for some time. Also, with 4K pages and a 16K cache, you can cycle a lot of data through the cache without having to do many more address translations. It is possible that the table entries would usually be flushed from the cache before they were used again. It seems that there might be more of an argument for cacheing the 1st-level tables ("segment" tables) since you might be touching several pages in the same segment and might benefit by being able to speed up the address translation for those nearby pages. The OS will actually be working with the tables as data. I assume that data accesses would load the table entries into the D-cache. There might be a case (page faults maybe) where already having a particular table entry already cached from a walk might speed up an OS function. I'll have to think about that one.... --- David P. Maynard (dpm@cs.cmu.edu) Dept. of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 --- Any opinions expressed are mine only. I haven't asked the ECE department or CMU what they think. --- --
paul@taniwha.UUCP (Paul Campbell) (12/08/88)
In article <3798@pt.cs.cmu.edu> dpm@k.gp.cs.cmu.edu (David Maynard) writes: >The OS will actually be working with the tables as data. I assume that data >accesses would load the table entries into the D-cache. There might be a >case (page faults maybe) where already having a particular table entry >already cached from a walk might speed up an OS function. I'll have to Of course on a system where TLB miss is in software then page tables are simply a figment of the OS's imagination ... the hardware doesn't care or know if they are out there (in this case design becomes a compromise between how easy your PTEs can be bent into the format to be poked in the chip's TLB entries vs a design that maps what you OS needs [for example BSD has different requirements from SV]). On the other hand one of the most expensive part of an operating system's manipulation of page tables is the need to flush the TLB when changes are made to the tables (ie to the mappings which they represent) ... the more you cache (segment table entries for example) the more the code has to be aware of the hardware .... I guess this is an argument for putting them in the data cache. Paul -- Paul Campbell ..!{unisoft|mtxinu}!taniwha!paul (415)420-8179 Taniwha Systems Design, Oakland CA "Read my lips .... no GNU taxes"
tom@nud.UUCP (Tom Armistead) (12/10/88)
In article <432@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes: >In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: >> Wrong! The 88200 does cache page descriptors. >Mr. Armistead of Motorola, please re-read the question and try to answer >it this time. I'm not asking you if the translated real addresses are >maintained in the PATC, I'm asking if the data from the two fetches done >during the table walk are placed in the d/i cache. If not, why not? If you want to post my emailed response, you may. --
andrew@frip.gwd.tek.com (Andrew Klossner) (12/13/88)
[] "The way I read the initial data sheet it doesn't look like the 88200 caches address translation tables if you access them during a table walk." That's right. "The OS will actually be working with the tables as data. I assume that data accesses would load the table entries into the D-cache." the hardware design requires that segment and page tables be in cache-inhibited pages. Maybe it would be safe not to cache-inhibit those pages and be sure to do cache flushes before the MMU does a table walk, but the documentation doesn't say so. My guess is that segment and page tables are not cached simply because it would make the hardware design very hard. The MMU and cache are already working in parallel, because the cache begins its search when the virtual address first shows up, at the same time that the MMU begins the virtual to physical translation. (The cache starts with the low 12 address bits, which are the same for virtual and physical, then uses the other physical address bits when the MMU has them available.) Giving the MMU the ability to break into the cache cycle on a BATC/PATC miss, then restarting the cache cycle, seems awfully complicated. There are 56 PATCs, for a total of (56*4k)=224k bytes addressability. The biggest instruction or data cache is 4 88200s for a total of 64k bytes. Therefore, a gross analysis suggests that any PTEs will evaporate from the data cache before they drop out of the PATC cache. (I like the idea of caching the segment table, though.) -=- Andrew Klossner (uunet!tektronix!hammer!frip!andrew) [UUCP] (andrew%frip.gwd.tek.com@relay.cs.net) [ARPA]