[comp.arch] 88K table walk

jml@ivory.SanDiego.NCR.COM (Michael Lodman) (12/03/88)

According to Motorola, the 88200 CMMU does not cache the page and
segment descriptors it fetches during a table walk. This would seem
to me to have a negative impact on performance. Is this standard
practise for table walks, and if not why did Motorola do it that way?
John Mashey, perhaps you could tell me if MIPS does this.

Michael Lodman  (619) 485-3335
Advanced Development NCR Corporation E&M San Diego
mike.lodman@ivory.SanDiego.NCR.COM 
{sdcsvax,cbatt,dcdwest,nosc.ARPA}!ncr-sd!ivory!jml

When you die, if you've been very, very good, you'll go to ... Montana.

tom@nud.UUCP (Tom Armistead) (12/03/88)

In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes:
>According to Motorola, the 88200 CMMU does not cache the page and
>segment descriptors it fetches during a table walk. This would seem

    Wrong! The 88200 does cache page descriptors.  Up to 56 page descriptors
(each descriptor maps 4K of virtual space) can be cached in each CMMU.  The
page descriptor cache is managed by the CMMU.

    In addition, per CMMU, software can map up to eight 512K chunks of
memory via the block address translation cache and avoid table walks on
contiguous areas of memory (e.g. kernel text and data, i/o areas, etc.).
-- 

mash@mips.COM (John Mashey) (12/03/88)

In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes:
>According to Motorola, the 88200 CMMU does not cache the page and
>segment descriptors it fetches during a table walk. This would seem
>to me to have a negative impact on performance. Is this standard
>practise for table walks, and if not why did Motorola do it that way?
>John Mashey, perhaps you could tell me if MIPS does this.
Yes.  If it's in the cache, it's in the cache.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

tim@crackle.amd.com (Tim Olson) (12/03/88)

In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes:
| In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes:
| >According to Motorola, the 88200 CMMU does not cache the page and
| >segment descriptors it fetches during a table walk. This would seem
| 
|     Wrong! The 88200 does cache page descriptors.  Up to 56 page descriptors
| (each descriptor maps 4K of virtual space) can be cached in each CMMU.  The
| page descriptor cache is managed by the CMMU.

Yes, the translation is cached in the TLB entries, but I think the
question was do the memory accesses that are performed during the table
walk also get cached in the data cache? Or must they always go to main
memory? 


	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)

paul@taniwha.UUCP (Paul Campbell) (12/05/88)

In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes:
>In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes:
>>According to Motorola, the 88200 CMMU does not cache the page and
>>segment descriptors it fetches during a table walk. This would seem
>
>    Wrong! The 88200 does cache page descriptors.  Up to 56 page descriptors
>(each descriptor maps 4K of virtual space) can be cached in each CMMU.  The

I think what he is really asking is 'does the 88200 cache the segment table
entries used to read page table entries with?'. 

I've often thought about whether this is worth while or not, it probably is for
'medium' to 'large' size programs ('small' ones only need a few entries and fit
in the TLB cache, 'large' to 'huge' ones may thrash). I think that simulation
is probably the only way to find out and of course your mileage may vary for
different architectures and different job mixes.

Of course systems that do software TLB misses like the 29K and the MIPS chips
take advantage of any data cache the system has since they treat both segment
table entry accesses and page table entry accesses like any other access. (This
helps you particularly for VERY large programs that are thrashing in the TLB
cache because the data cache speeds up TLB refills)

	Paul

-- 
Paul Campbell			..!{unisoft|mtxinu}!taniwha!paul (415)420-8179
Taniwha Systems Design, Oakland CA

 	"Read my lips .... no GNU taxes"

jml@ivory.SanDiego.NCR.COM (Michael Lodman) (12/06/88)

In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes:
>In article <415@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes:
>>According to Motorola, the 88200 CMMU does not cache the page and
>>segment descriptors it fetches during a table walk.
>
>    Wrong! The 88200 does cache page descriptors.

Mr. Armistead of Motorola, please re-read the question and try to answer
it this time. I'm not asking you if the translated real addresses are
maintained in the PATC, I'm asking if the data from the two fetches done
during the table walk are placed in the d/i cache. If not, why not?


Michael Lodman  (619) 485-3335
Advanced Development NCR Corporation E&M San Diego
mike.lodman@ivory.SanDiego.NCR.COM 
{sdcsvax,cbatt,dcdwest,nosc.ARPA}!ncr-sd!ivory!jml

When you die, if you've been very, very good, you'll go to ... Montana.

dpm@k.gp.cs.cmu.edu (David Maynard) (12/07/88)

> I'm asking if the data from the two fetches done during the table walk are
> placed in the d/i cache.  If not, why not?

I was just thinking about asking this question on a final exam in a computer
architecture course.  (Then I decided it was too hard.)  I don't know why
Motorola did one thing or another, but here are my 2 cents worth.

The way I read the initial data sheet it doesn't look like the 88200 caches
address translation tables if you access them during a table walk.  However,
if the PATC/BATC are big enough, you might just be polluting the cache if
you loaded the table entries.  Presumably, the only way a "normal" user
program will access the tables is via a table walk.  If you cache the
results of the walk in the PATC then you supposedly won't need to access the
table locations again for some time.  Also, with 4K pages and a 16K cache,
you can cycle a lot of data through the cache without having to do many more
address translations.  It is possible that the table entries would usually
be flushed from the cache before they were used again.  It seems that there
might be more of an argument for cacheing the 1st-level tables ("segment"
tables) since you might be touching several pages in the same segment and
might benefit by being able to speed up the address translation for those
nearby pages.

The OS will actually be working with the tables as data.  I assume that data
accesses would load the table entries into the D-cache.  There might be a
case (page faults maybe) where already having a particular table entry
already cached from a walk might speed up an OS function.  I'll have to
think about that one....

 ---
 David P. Maynard (dpm@cs.cmu.edu)
 Dept. of Electrical and Computer Engineering
 Carnegie Mellon University
 Pittsburgh, PA  15213
 ---
 Any opinions expressed are mine only.  I haven't asked the ECE department
 or CMU what they think.
 ---

-- 

paul@taniwha.UUCP (Paul Campbell) (12/08/88)

In article <3798@pt.cs.cmu.edu> dpm@k.gp.cs.cmu.edu (David Maynard) writes:
>The OS will actually be working with the tables as data.  I assume that data
>accesses would load the table entries into the D-cache.  There might be a
>case (page faults maybe) where already having a particular table entry
>already cached from a walk might speed up an OS function.  I'll have to

Of course on a system where TLB miss is in software then page tables are
simply a figment of the OS's imagination ... the hardware doesn't care
or know if they are out there (in this case design becomes a compromise
between how easy your PTEs can be bent into the format to be poked in the
chip's TLB entries vs a design that maps what you OS needs [for example BSD
has different requirements from SV]).

On the other hand one of the most expensive part of an operating system's
manipulation of page tables is the need to flush the TLB when changes are
made to the tables (ie to the mappings which they represent) ... the more
you cache (segment table entries for example) the more the code has to be
aware of the hardware .... I guess this is an argument for putting them
in the data cache.

		Paul

-- 
Paul Campbell			..!{unisoft|mtxinu}!taniwha!paul (415)420-8179
Taniwha Systems Design, Oakland CA

 	"Read my lips .... no GNU taxes"

tom@nud.UUCP (Tom Armistead) (12/10/88)

In article <432@ncr-sd.SanDiego.NCR.COM> jml@ivory.SanDiego.NCR.COM (Michael Lodman) writes:
>In article <1583@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes:
>>    Wrong! The 88200 does cache page descriptors.
>Mr. Armistead of Motorola, please re-read the question and try to answer
>it this time. I'm not asking you if the translated real addresses are
>maintained in the PATC, I'm asking if the data from the two fetches done
>during the table walk are placed in the d/i cache. If not, why not?

    If you want to post my emailed response, you may.  
-- 

andrew@frip.gwd.tek.com (Andrew Klossner) (12/13/88)

[]

	"The way I read the initial data sheet it doesn't look like the
	88200 caches address translation tables if you access them
	during a table walk."

That's right.

	"The OS will actually be working with the tables as data.  I
	assume that data accesses would load the table entries into the
	D-cache."

the hardware design requires that segment and page tables be in
cache-inhibited pages.  Maybe it would be safe not to cache-inhibit
those pages and be sure to do cache flushes before the MMU does a table
walk, but the documentation doesn't say so.

My guess is that segment and page tables are not cached simply because
it would make the hardware design very hard.  The MMU and cache are
already working in parallel, because the cache begins its search when
the virtual address first shows up, at the same time that the MMU
begins the virtual to physical translation.  (The cache starts with the
low 12 address bits, which are the same for virtual and physical, then
uses the other physical address bits when the MMU has them available.)
Giving the MMU the ability to break into the cache cycle on a BATC/PATC
miss, then restarting the cache cycle, seems awfully complicated.

There are 56 PATCs, for a total of (56*4k)=224k bytes addressability.
The biggest instruction or data cache is 4 88200s for a total of 64k
bytes.  Therefore, a gross analysis suggests that any PTEs will
evaporate from the data cache before they drop out of the PATC cache.
(I like the idea of caching the segment table, though.)

  -=- Andrew Klossner   (uunet!tektronix!hammer!frip!andrew)    [UUCP]
                        (andrew%frip.gwd.tek.com@relay.cs.net)  [ARPA]