[comp.arch] Anyone for memory management on the AM29000?

tve@ethz.UUCP (Th. von Eicken) (04/21/87)

Disclaimer:
We have the advance information data sheet and not (yet) the user's
manual so maybe our information is out-dated...

When reading the data sheet I noticed that the TLB entries
don not have any "page used" flag nor any "page modified"
flag. Does that mean that the AM29000 memory managenent is even
more crippled than on a VAX (which doesn't have a "page used" flag???

On TLB misses, as far as I understand, a software trap is generated.
Are there any figures on typical interrupt routine times for handling
the misses? What is the performance penalty, compared to miss
handling in hardware?

	Thorsten von Eicken	tve@ethz.uucp
	ETH Zuerich		...!seismo!mcvax!cernvax!ethz!tve
	Switzerland
---

henry@utzoo.UUCP (Henry Spencer) (04/23/87)

> When reading the data sheet I noticed that the TLB entries
> don not have any "page used" flag nor any "page modified"
> flag. Does that mean that the AM29000 memory managenent is even
> more crippled than on a VAX (which doesn't have a "page used" flag???

Sounds reasonable.  MIPSCo did this too.  The result isn't "crippled",
because it takes a fair bit of hardware to add those flags and they don't
buy you very much over a software simulation.  For real programs, on a
given page each of those flags is either constantly on or rarely on,
meaning that the software-simulation overhead is quite low.  Notice that
not implementing the flags means that TLB entries never have to be written
back to main memory when they are replaced in the TLB, which makes TLB
management simpler and quicker.  On the whole, it's a win.  I looked at
TLB management a little bit a while ago, and formed a strong suspicion
that software simulation was actually the way to go even when the hardware
*does* support the flags!  (Software simulation is more flexible.)

> On TLB misses, as far as I understand, a software trap is generated.
> Are there any figures on typical interrupt routine times for handling
> the misses? What is the performance penalty, compared to miss
> handling in hardware?

Dunno about the AMD machine, but MIPSCo cited typical TLB-miss-handling
times in software (they do that too) substantially *shorter* than the
VAX 780's hardware times.  Given careful design, there need be little
penalty.
-- 
"If you want PL/I, you know       Henry Spencer @ U of Toronto Zoology
where to find it." -- DMR         {allegra,ihnp4,decvax,pyramid}!utzoo!henry

bcase@amdcad.AMD.COM (Brian Case) (04/23/87)

In article <67@bernina.UUCP> tve@ethz.UUCP (Th. von Eicken) writes:
>When reading the data sheet I noticed that the TLB entries
>don not have any "page used" flag nor any "page modified"
>flag. Does that mean that the AM29000 memory managenent is even
>more crippled than on a VAX (which doesn't have a "page used" flag???
>
>On TLB misses, as far as I understand, a software trap is generated.
>Are there any figures on typical interrupt routine times for handling
>the misses? What is the performance penalty, compared to miss
>handling in hardware?

Yeah, questions about the "missing" page referenced and modified bits
in the TLB are always among the first to be asked when people are
presented with the Am29000.  The deal is:  these bits don't belong in
TLB entries, they belong either in the page tables themselves or in
the physical page map (note that for inverted page tables, these
structures are (or can be) the same thing).  The VAX is brain-damaged
because the TLB reload is done by hardware (well, microcode) and it
forgets to take note of some of the information that OS guys would
like to have.  Since the Am29000 TLB reload is done by a software
routine, you not only can decide what the page tables look like, but
you can also decide whether or not to gather referenced and modified
information.

Note that referenced information is available degenerately by the
very fact that that TLB entry is present at all (the fact that the
TLB entry was fetched from the page table means that the page has
been referenced).  Page modified can be gathered in software too, if
you are willing to take the performance hit:  put the TLB entry for
the page into the TLB but set the write-protection bit(s) (one for
supervisor one for user); then, when a write to the page is attempted,
a protection violation trap will be taken; at this point, look in the
page table to make sure that the page is suposed to be read-only; if
not, then change the TLB entry to allow writing and count a page
modification in the page table (or physical page map).

But this is not the right way to do it anyway.  The right way is to
have a small RAM-based table in the memory controller keep track of
page modification:  there is very little overhead and the information
is maintained on a per-physical-page basis, just as it should be.
Also, it is probably the best way for multiprocessor systems.

I have written a paper about TLB reload for the Am29000, complete
with page table structures and code examples for two-level and inverted-
page tables.  There is also a discussion of TLB miss processing overhead
for a few of our benchmark programs (nroff, our assembler, puzzle,
etc.).  The overhead, in added cycles per instruction, is typically
less than 0.01 with the max (for the examples given) at 0.27 for the
"rm" command (this attrociously high number is due to the fact that
rm is a very short program so the cold-start penalty is a high
percentage of the total time).  The TLB miss ratios go from 1.50%
(yeech) for rm to < 0.01% for puzzle.  Four of the six programs have
TLB miss ratios < 0.05%, the next highest is 0.12% (nroff), and then
is rm at 1.50%.  Note that the only instructions in the Am29000 set
that can cause a TLB miss are jumps, calls, loads, and stores (well,
there can also be a TLB miss "caused" by the other instructions
when a page boundary is crossed, but the frequency of this event is
extremely low).  For the routines I wrote, the two-level TLB miss
handler takes 42 cycles while the inverted-page miss handler takes
63 cycles on average (both include all overhead and assume single-
cycle burst, two-cycle first access memories).

I can send copies of the paper to those interested.... (It's text
and graphics, so I can't just post.)

    bcase

lamaster@pioneer.UUCP (04/23/87)

In article <16336@amdcad.AMD.COM> bcase@amdcad.UUCP (Brian Case) writes:
>In article <67@bernina.UUCP> tve@ethz.UUCP (Th. von Eicken) writes:
>>When reading the data sheet I noticed that the TLB entries
>>don not have any "page used" flag nor any "page modified"
>>flag.

:

>Yeah, questions about the "missing" page referenced and modified bits
>in the TLB are always among the first to be asked when people are
>presented with the Am29000.  The deal is:  these bits don't belong in
>TLB entries, they belong either in the page tables themselves or in
>the physical page map (note that for inverted page tables, these
>structures are (or can be) the same thing).  The VAX is brain-damaged

:

>
>But this is not the right way to do it anyway.  The right way is to
>have a small RAM-based table in the memory controller keep track of
>page modification:  there is very little overhead and the information
>is maintained on a per-physical-page basis, just as it should be.
>Also, it is probably the best way for multiprocessor systems.
>
:

This may not be the best way to handle it if the architecture supports
multiple page sizes (and even multiple simultaneous page sizes), because the
page size in effect is not (easily) "known" by the memory controller.  

Even with an architecture-fixed page size, you have decreased the complexity
of the processor/TLB at the expense of the memory controller, which is OK for
a multi-chip/board CPU, but probably not the best for a microprocessor
(consider the history of MC68xxx memory management).


  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")