mark@hubcap.UUCP (Mark Smotherman) (04/21/88)
In reviewing hardware support for page replacement policies, I see that IBM mainframes allocate the reference(/accessed) bit and the change(/dirty/ modified) bit in the storage key associated with the physical page frame. On the other hand, modern microprocessors (e.g. 80386, NS32082 MMU) allocate these bits in the page table entry. The tradeoff seems to be extra opcodes on the IBM versus slightly larger page table entries and some increase in memory traffic (and invalidate signals) for the microprocessors. On the IBM (XA), three instructions operate on the reference and change bits: RRBE - reset reference bit extended, SSKE - set storage key extended, and ISKE - insert storage key extended. RRBE does the obvious, SSKE can be used to reset the change bit, and ISKE places a copy of the storage key and reference and change bits in a specified register. The reference and change bits are also set by any channel operations. I don't have any hardware manuals available for the 386 or 32082 that give full descriptions, but I assume they work in the following way. 1. Given the absence of the page table entry from the TLB: * Upon a reference, the page table entry is brought into the TLB and the reference bit is inspected. If the reference bit is zero, then it is set to one. This update of the entry must be done in a store- through manner. That is, not only the should the TLB copy of the page table entry be updated, but the copy of the entry in the cache should also be updated. (Of course, a designer could eliminate the inspection of the reference bit during a critical path by performing the store- through each time.) An additional store-through to main memory and cache invalidate signal would be needed in a multiprocessor. There would not be a need for a TLB invalidate signal. * Upon a change, the page table entry is processed as above, only with both bits set to one. The condition causing memory traffic is if either bit is zero (00, 01, or 10). 2. Given that the entry is currently in the TLB: * A reference has no effect, since to be in the TLB the reference bit must already be set. * Upon a change, the update of the page table entry must be done in a store-through manner only if the change bit in the TLB is not already set to one. (Eliminating this inspection would seem to relatively expensive in terms of the additional memory traffic. Aren't writes about 30% of memory operations, as a rule of thumb?) 3. Immediately after an instruction changes a page table entry (e.g. reset the reference or change bits), the TLB must be purged. For multiprocessors the cache must also be purged (or the change must have been a store-through) and invalidate signals sent to the other processors to purge their TLBs and caches. For those who know, is this truly how things work? Do you have any idea (or better yet, any measurements) of the amount of memory traffic involved in the setting of the reference and change bits? Can I/O processors (DMA or whatever) on these micros affect these bits? P.S. The IBM XA Principles of Operation manual gives an unusual disclaimer on the reference bit: "The reference bit may be set to one by fetching data or instructions that are neither designated nor used by the program, and, under certain conditions, a reference may be made without the reference bit being set to one. Under certain unusual circumstances, a reference bit may be set to zero by other than explicit program action." [p. 3-11, March 1983 edition of SA22-7085-0] The first case would appear to be caused by prefetching that crosses page boundaries (e.g. branch target buffers). The other two cases elude me. -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark
jamesa%betelgeuse@Sun.COM (James D. Allen) (04/21/88)
In article <1458@hubcap.UUCP>, mark@hubcap.UUCP (Mark Smotherman) writes: > P.S. The IBM XA Principles of Operation manual gives an unusual disclaimer > on the reference bit: "The reference bit may be set to one by fetching data > or instructions that are neither designated nor used by the program, and, > under certain conditions, a reference may be made without the reference bit > being set to one. Under certain unusual circumstances, a reference bit may > be set to zero by other than explicit program action." [p. 3-11, March 1983 > edition of SA22-7085-0] The first case would appear to be caused by > prefetching that crosses page boundaries (e.g. branch target buffers). The > other two cases elude me. 370 Models 165-II, 168, 3032, 3033 (and 308x, 309x ?) do not set the reference bit on cache hits. I don't know about "setting the reference bit to zero by other than explicit program action" but the 168 family stored three copies of each reference bit and used "majority rule" instead of parity-checking so the clause may have been intended as a loophole for hardware errors.
jeff@Alliant.COM (Jeff Collins) (04/23/88)
In article <1458@hubcap.UUCP> mark@hubcap.UUCP (Mark Smotherman) writes: > Removed a discussion about IBM reference and dirty bits. > >I don't have any hardware manuals available for the 386 or 32082 that give >full descriptions, but I assume they work in the following way. > >1. Given the absence of the page table entry from the TLB: > * Upon a reference, the page table entry is brought into the TLB and > the reference bit is inspected. If the reference bit is zero, then > it is set to one. This update of the entry must be done in a store- > through manner. That is, not only the should the TLB copy of the page > table entry be updated, but the copy of the entry in the cache should > also be updated. (Of course, a designer could eliminate the inspection > of the reference bit during a critical path by performing the store- > through each time.) An additional store-through to main memory and > cache invalidate signal would be needed in a multiprocessor. There > would not be a need for a TLB invalidate signal. > * Upon a change, the page table entry is processed as above, only with > both bits set to one. The condition causing memory traffic is if > either bit is zero (00, 01, or 10). On a multiprocessor the decision to write the PTE back to main memory or not is determined by the cache protocol. If it is write-through, then yes, the PTE must be written back to memory. If the cache is write-back, then it may not be written back to memory. When the hardware sets a reference and/or modified bit in the TLB, the operating system does not know that the bit is being set, it is automatic. Given that the software does not know that the bit is set, there is no way to tell the other processors to perform an invalidate. Instead there are two ways to solve this race condition. One is to not share PTEs. This means that each process has private copies of the hardware page tables. When an update is made by hardware to the TLB, no one else cares because no one else could have the PTE cached in the TLB (this assumes the TLB is flushed on context switch). If the operating system allows shared PTEs (this would be done to allow multiple processes to share memory), then the problem can be effectively ignored. With reference bits it is not very important if they become inconsistent. It only means that you lose a little accuracy on your working set calculations. With modified bits it is very important to keep them consistent, or to not care what they are. This can be done by always assuming that shared data is share, or never releasing it - either solution works. > >2. Given that the entry is currently in the TLB: Eliminated this text as I had nothing to add. > >3. Immediately after an instruction changes a page table entry (e.g. reset >the reference or change bits), the TLB must be purged. For multiprocessors >the cache must also be purged (or the change must have been a store-through) >and invalidate signals sent to the other processors to purge their TLBs and >caches. This is close. If the operating system clears a referenced or modified bit on a shared PTE, then it must purge it's TLB and cause all of the other processors that could have the PTE cached in the TLB to purge. Again note this is only a problem with shared PTEs. The cache does not need to be purged. When the operating system writes the PTE entry, it writes to the cache/memory system. The cache will contain the correct version after the reset, the TLB contains an old version - which is why the TLB entry must be purged. (by the way most of the MMUs allow a single entry to be purged, instead of the whole TLB) > >For those who know, is this truly how things work? Do you have any idea >(or better yet, any measurements) of the amount of memory traffic involved >in the setting of the reference and change bits? Can I/O processors (DMA >or whatever) on these micros affect these bits? > The setting/clearing of the referenced and modified bits are not a big deal (ie. they don't cause a lot of bus traffic). This is because it will only cause traffic the first time it is changed, and that is a very small percentage of the overall number of processor reads and writes. To re-emphasize the multiprocessor issues here - the only trouble is with shared user level PTEs. Note that shared pages do not necessarily imply shared PTEs. It is possible to build virtual memory systems that have shared pages and private PTEs - this is what Mach and Encore (Umax 4.2) do. This saves the invalidates and the consistency problems. I/O processors do not use these bits (they make physical accesses).
mash@mips.COM (John Mashey) (04/26/88)
In article <1647@alliant.Alliant.COM> jeff@alliant.UUCP (Jeff Collins) writes: .... > When the hardware sets a reference and/or modified bit in the TLB, the > operating system does not know that the bit is being set, it is > automatic. Given that the software does not know that the bit is set, > there is no way to tell the other processors to perform an invalidate. .... > > The setting/clearing of the referenced and modified bits are not a big > deal (ie. they don't cause a lot of bus traffic). This is because it > will only cause traffic the first time it is changed, and that is a > very small percentage of the overall number of processor reads and > writes. At least some current machines, especially several of the RISC systems (HP Precision, MIPS, AMD 29K) use TLBs that do software refill, and trap on transitions [such as attempts to set the modify bit]. Current UNIXes often forbid the hardware from writing modify bits directly, in order to do copy-on-write processing. In other words, they go around a hardware feature that often adds substantial complexity to a design, in order to do what they really want. Current UNIXes almost always want to trap the first write to a page, unless the first reference to a page is a write, not a read, os that the kernel knows that the page should be allocated as dirty in the first place. Frequencies of transition between clean-but-writable and dirty vary according to the UNIX variant. The cases are as follows: a) 1st reference to a data page is a read, so that a copy of the data is brought into memory. Later a write occurs. b) 1st reference to a BSS page is a read. Create a page of zeroes. Later, a write occurs. c) Fork with copy-on-write is used. Copy the page tables, mark everything read-only, then copy the pages when written. d) Use copy-on-write for mapped files, for buffer cache, etc. e) User attempts to write to a truly nonwritable data page. If you look at these cases, you find either that the frequency is low (on the order of disk event rates), or that there is substantial overhead (like zeroing a page), or that you're about to kill the process anyway. Although this has nothing particular to do with RISCs, a number of them (such as the HP Precision, MIPS R2000, and AMD 29K, at least) do TLB handling in software, and generally trap modifies rather than letting the hardware do it. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
kds@mipos2.UUCP (05/12/88)
Using software to change modify/use bits, or even to do TLB refills is not usually an option on cisc type machines because the time to switch into the service call is usually quite a bit longer than on something like the mips. In fact, the mips dedicates some of the user visible general purpose registers just so they don't have to save them when they want to service, for example, a TLB refill. Which isn't to say either method is bad, but just that the same solution isn't necessarily applicable to the same problem in two different environments. You don't have to break many eggs to hate omlets -- Ian Shoales Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds