lm@cottage.WISC.EDU (Larry McVoy) (07/16/87)
Here's a question. Why do people build their caches to respond to physical addresses instead of virtual addresses? Another way to state the question is: why not put the VM -> PM translation logic next to (in parallel with) the data cache, rather than before it? If you cache virtual addresses you can present the address to the cache as soon as it is generated, no delay do translation. At the same time you are doing the cache lookup you can be doing the translation in case there is a miss. Am I missing something or is this the wave of the future? Thank you fer yer support, Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy
bader+@andrew.cmu.edu (Miles Bader) (07/16/87)
> Here's a question. Why do people build their caches to respond to physical > addresses instead of virtual addresses? Another way to state the question > is: why not put the VM -> PM translation logic next to (in parallel with) > the data cache, rather than before it? If different processes have different parts of their virtual address space mapped to the same physical memory, a physical cache allows them to share the same cache entries. Also, each cache entry in a virtual cache has to have a field describing which address map it's from if you don't want to have to flush the cache upon context switch, etc. -Miles
petolino%joe@Sun.COM (Joe Petolino) (07/16/87)
>Here's a question. Why do people build their caches to respond to physical >addresses instead of virtual addresses? [ . . . ] >If you cache virtual addresses you can present the address to the cache >as soon as it is generated, no delay do translation. At the same time you >are doing the cache lookup you can be doing the translation in case there >is a miss. > >Am I missing something or is this the wave of the future? Not only is it the wave of the future, but of the past and present as well. Designers of high-performance machines (IBM, Amdahl, and Sun, to name only a few) have been using virtual-addressed caches for years, mainly for the speed advantage noted above. The reason that this is not done universally has to do with cache consistency problems. If a word's location in the cache is a function of the Virtual address by which it was last accessed, it can be difficult to find that word again if another processor (e.g. a DMA controller), or even another process running on the same processor, tries to access it by a different virtual address. Two processes who share a word but don't agree on where it is in the cache are bound to confuse each other. There are a number of tricks used to solve this problem so that virtual addresses can be used to access a cache - I think this topic has been discussed here before. -Joe
tim@amdcad.AMD.COM (Tim Olson) (07/16/87)
In article <3904@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: +----- | Here's a question. Why do people build their caches to respond to physical | addresses instead of virtual addresses? Another way to state the question | is: why not put the VM -> PM translation logic next to (in parallel with) | the data cache, rather than before it? +----- The potential benefit of this (assuming an external mmu) is a decrease in the latency from virtual address valid to cache access. However, there are also problems: 1) Cache tags must include a process-id field (more RAM for the tags, larger tag comparators) or the cache must be flushed on every context switch (very expensive for large caches.) 2) It is very hard to provide for cache consistency in a multiprocessor (or even uniprocessor + i/o, but less so) environment; it basically requires a reverse-mapping from physical address -> virtual address. All in all, if you can hide the address translation time in a pipeline stage, you are probably better off using physical caches. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
ps@celerity.UUCP (Pat Shanahan) (07/16/87)
In article <3904@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: >Here's a question. Why do people build their caches to respond to physical >addresses instead of virtual addresses? Another way to state the question >is: why not put the VM -> PM translation logic next to (in parallel with) >the data cache, rather than before it? > ... > >Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy Virtual address caches are limited in their applications, because of the difficulty in maintaining consistency. It is possible for a single item of data to have several different addresses. For example, an area of System V style shared memory can be attached at different virtual addresses by different processes. If the system either has multiple caches, or does not purge the cache on context switch, the same shared data can be in the cache system with several different virtual addresses in different address spaces. Suppose one of the processes modifies the data. The system has to ensure that all cache copies of that data are either deleted or updated. If the cache is real-addressed this is relatively easy. If it is virtual addressed the system has the problem of determining all the addresses the data might have. There is also a minor difficulty with using virtual addresses for caches that are not purged on context switch. The virtual address has to be extended by appending some form of context identifier so that equal virtual addresses in different address spaces will not be confused. Virtual addressed caches can work very well, for example for instruction caches. Instruction modification is a much rarer event than data modification and can be handled by doing general purges rather than purging only the specific item. -- ps (Pat Shanahan) uucp : {decvax!ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ps arpa : sdcsvax!celerity!ps@nosc
amos@nsta.UUCP (Amos Shapir) (07/16/87)
In article <3904@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: >Here's a question. Why do people build their caches to respond to physical >addresses instead of virtual addresses? Well, not all do: CCI's 6/32 (also sold by Harris and Sperry) has virtual caches; the trouble is, in Unix all user processes share the same virtual space - they all start at their own virtual 0. Having a virtual cache requires the kernel to either purge all cache at context switch, or manage a complicated bookkeeping of who has what in which cache (CCI do the latter). >If you cache virtual addresses you can present the address to the cache >as soon as it is generated, no delay do translation. In machines with physical cache (such as NS32532), this is accomplished by an auxiliary Translation Look-aside Buffer (TLB); it should be big enough to be useful, yet small enough to be purged on every context switch without a significant reduction of performance. -- Amos Shapir National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel Tel. (972)52-522261 amos%nsta@nsc.com @{hplabs,pyramid,sun,decwrl} 34 48 E / 32 10 N
rrs@amdahl.amdahl.com (Bob Snead) (07/17/87)
In article <3904@spool.WISC.EDU>, lm@cottage.WISC.EDU (Larry McVoy) writes: > ........ Why do people build their caches to respond to physical > addresses instead of virtual addresses? ... > > Am I missing something or is this the wave of the future? In fact, it's the wave of the present. Amdahl 580s have virtually addressed caches. Claimer: "There is no way of exchanging information that does not demand an act of judgment." - Jacob Bronowski Disclaimer: If you perceived opinions in what I have written they are probably your own and certainly not Amdahl Corp's. Bob Snead Future Computing Technologies Amdahl Corp. UUCP: ..!{ihnp4, hplabs, amd, sun, ...}!amdahl!rrs
roy@phri.UUCP (Roy Smith) (07/17/87)
In article <3904@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: > Why do people build their caches to respond to physical addresses instead > of virtual addresses? The scheme Larry describes sure sounds like it would be a win but for one problem. How do you deal with invalidating cache lines when some DMA I/O device writes into the corresponding main memory location? The I/O device is generating physical addresses but the cache is keying on virtual addresses. Now that I've posed the problem, I'll throw out some possible answers. 1) When you store to cache, install VA as the key and store the PA along with the data. CPU-generated addresses key on VA, I/O generated addresses key on PA for purposes of invalidation. Makes the cache wider and adds more key-match logic. 2) Do I/O with VA's instead of PA's and have DMA go through the VM machinery. Doesn't make the I/O device any more complicated to build (in fact, doesn't change it one whit) but adds more complexity to the I/O bus adaptor. Add to all this the fact that you can (and very well might want to) have multiple VA's mapping to the same PA. Nasty wrinkle all around. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
montnaro@sprite.steinmetz (Skip Montanaro) (07/17/87)
Some folks from Sun presented an article on the virtual address cache mechanism developed for the Sun-3/200 product line at the recent Usenix conference in Phoenix. It presents the pros and cons of this scheme (as I recall). Skip| ARPA: montanaro@ge-crd.arpa Montanaro| UUCP: montanaro@desdemona.steinmetz.ge.com "How sweet it is!" -- The Great One
fdr@apollo.uucp (Franklin Reynolds) (07/17/87)
In article <3904@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: >Here's a question. Why do people build their caches to respond to physical >addresses instead of virtual addresses? Another way to state the question >is: why not put the VM -> PM translation logic next to (in parallel with) >the data cache, rather than before it? > >If you cache virtual addresses you can present the address to the cache >as soon as it is generated, no delay do translation. At the same time you >are doing the cache lookup you can be doing the translation in case there >is a miss. This idea has merit and some people already build virtual caches. Isn't the 68020 Icache virtual? I have heard rumors that the caches of the 68030 will be virtual. However, virtual caches are tricky. In order to avoid excessive cache flushing you usually have to include some sort of address space identification tag for each entry. You also have to decide whether you want to be able to support the ability to different virtual addresses to the same physical address (a very useful feature for systems that support shared memory or mapped files). Franklin Reynolds, Apollo Computer fdr@apollo.uucp mit-eddie!apollo!fdr
jroberts@attvcr.UUCP (John Roberts) (07/17/87)
In article <3904@spool.WISC.EDU>, lm@cottage.WISC.EDU (Larry McVoy) writes: > Here's a question. Why do people build their caches to respond to physical > addresses instead of virtual addresses? Another way to state the question > is: why not put the VM -> PM translation logic next to (in parallel with) > the data cache, rather than before it? > > Am I missing something or is this the wave of the future? Actually, the 3B2/600 (and I would imagine many other machines) use a virtual cache scheme. In the case of the 3B2, its 6K partitioned as 4K instruction and 2K data (I think). As to how the potential pitfalls of this scheme are handled, I don't know. Perhaps someone with more detailed knowledge will post something. BTW, the 600 will have a multi-processor board this fall. It's a slave, but it still may complicate things. -- John M. Roberts AT&T Canada Vancouver BC (604) 689-8911 {ihnp4!alberta,uw-beaver}!ubc-vision!attvcr!jroberts What! Me Worry? attsayi fsh!would
corbin@encore.UUCP (07/17/87)
In article <3904@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: >Here's a question. Why do people build their caches to respond to physical >addresses instead of virtual addresses? Another way to state the question >is: why not put the VM -> PM translation logic next to (in parallel with) >the data cache, rather than before it? > >Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy Take a look at Prime Computer's architecture, they have been doing virtual caches for over 15 years. Parallel lookup on cache and TLB is faster than single threading them but this type of design can pose serious problems with shared data if the architecture of the machine and the OS is not done right. -- Stephen Corbin {ihnp4, allegra, linus} ! encore ! corbin
corbin@encore.UUCP (07/17/87)
In article <YUzCb7y00UkaU7k0Rj@andrew.cmu.edu> bader+@andrew.cmu.edu (Miles Bader) writes: >> Here's a question. Why do people build their caches to respond to physical >> addresses instead of virtual addresses? Another way to state the question >> is: why not put the VM -> PM translation logic next to (in parallel with) >> the data cache, rather than before it? > >If different processes have different parts of their virtual address space >mapped to the same physical memory, a physical cache allows them to share the >same cache entries. Also, each cache entry in a virtual cache has to have a >field describing which address map it's from if you don't want to have to >flush the cache upon context switch, etc. > > -Miles The `Segmented Address Space` architecture of Prime systems solves the problem of multiple cached entries of the same data and doesn't require the address map identifier in the cache. It works as follows: A specific number of segments in the virtual space are used for sharing and are common to the address space of all processes in the system. For example, if segment 1000 is a share segment then multiple processes virtual segment 1000 will map to the same physical segment in memory. Thus sharing is achieved, duplicate cached entries of the same data is avoided and the mapping for the shared data is maintained in one table. The disadvantage of this approach is that the amount of private memory each process has available is reduced by the defined amount of shared space in the system. For the Prime machines 1/4 of the virtual space is allocated as shared data. (actually 1/2 since the operating system is embedded in 1/4 of the virtual space). Steve {ihnp4, allegra, linus} ! encore ! corbin -- Stephen Corbin {ihnp4, allegra, linus} ! encore ! corbin
kenm@sci.UUCP (Ken McElvain) (07/17/87)
In article <3904@spool.WISC.EDU>, lm@cottage.WISC.EDU (Larry McVoy) writes:
- Here's a question. Why do people build their caches to respond to physical
- addresses instead of virtual addresses? Another way to state the question
- is: why not put the VM -> PM translation logic next to (in parallel with)
- the data cache, rather than before it?
-
- If you cache virtual addresses you can present the address to the cache
- as soon as it is generated, no delay do translation. At the same time you
- are doing the cache lookup you can be doing the translation in case there
- is a miss.
There are quite a few machines that use virtual addresses for the data
cache (Amdahl,Sun,..). That way the translation is only needed for
cache misses. This creates major problems if the virtual address
space can be aliased (Two virtual addresses mapping to one physical address)
because IO is usually done with physical addresses and a reverse
translation is needed. This is apparently not insoluable since the
Amdahl has this problem. Multi-processor systems will have a like
problem.
Another thing you can do is to limit the cache set size to the virtual
page size. Then the address translation can happen at the same time
as the tag set access and finish in time to compare with the output
of the tag rams. This doesn't work too well with machines with small
page sizes like a Vax but can be reasonable if the page size is ~4Kb
and up.
Ken McElvain
decwrl!sci!kenm
ps@celerity.UUCP (Pat Shanahan) (07/20/87)
In article <1762@encore.UUCP> corbin@encore.UUCP (Steve Corbin) writes: ... > > >The `Segmented Address Space` architecture of Prime systems solves the >problem of multiple cached entries of the same data and doesn't require >the address map identifier in the cache. It works as follows: > > A specific number of segments in the virtual space are used for > sharing and are common to the address space of all processes in > the system. For example, if segment 1000 is a share segment then > multiple processes virtual segment 1000 will map to the same > physical segment in memory. Thus sharing is achieved, duplicate > cached entries of the same data is avoided and the mapping for > the shared data is maintained in one table. > ... >Steve {ihnp4, allegra, linus} ! encore ! corbin > >Stephen Corbin >{ihnp4, allegra, linus} ! encore ! corbin I'm curious about this. How would one use this to implement, for example, System V shared memory? The shared memory interfaces seem to allow for processes to attach the same block of shared memory at different addresses, and for different processes to use the same virtual address for different blocks of shared memory. -- ps (Pat Shanahan) uucp : {decvax!ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ps arpa : sdcsvax!celerity!ps@nosc
thomson@uthub.UUCP (07/20/87)
In article <2798@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > The scheme Larry describes sure sounds like it would be a win but >for one problem. How do you deal with invalidating cache lines when some >DMA I/O device writes into the corresponding main memory location? The I/O >device is generating physical addresses but the cache is keying on virtual >addresses. These IOs were presumably scheduled by software, and the software presumably knows where the device write was directed, so there should be no difficulty in having the processor(s) do a software invalidate of the appropriate virtual addresses once the IOs complete. Note that the transient inconsistency between completion of a device write to a location and the (possibly much) later software invalidation does not pose a problem, since the software will already be structured such that those locations are not read until the IO operation terminates. -- Brian Thomson, CSRI Univ. of Toronto utcsri!uthub!thomson, thomson@hub.toronto.edu
welland@cbmvax.UUCP (Bob Welland) (07/24/87)
>Here's a question. Why do people build their caches to respond to physical >addresses instead of virtual addresses? [ . . . ] >If you cache virtual addresses you can present the address to the cache >as soon as it is generated, no delay do translation. At the same time you >are doing the cache lookup you can be doing the translation in case there >is a miss. > >Am I missing something or is this the wave of the future? There are a few reasons why people use physical address caches instead of virtual address caches (to reverse the perspective): 1. Cache consistency is very difficult with virtual address caches. This is because virtual addresses are "private" to the process they are associated with. Physical addresses are the "normal form" for the system as a whole. Cache consistency is basically collision detection. Two detect a collision you need to compare addresses. Normal form addresses are easy to compare (equality) while private form addresses require a more complex comparison algorithm. 2. Extra tag space is needed for a process Id to distinguish colliding virtual addresses from different processes. It is also necessary to flush the cache when you reuse a process ID (or use a big PID field), This can be rather ugly. 3. Often it is possible to use the low order address bits (which in a paging system are untranslated) to access the cache in parallel to doing the address translation. In VLSI these paths can be very well matched. Usually you use a small content addressable memory (CAM) for address translation and a large RAM for the tags. This often means that address translation is "free" because it is done in parallel. This is easy to do in VLSI but quite difficult with random logic and so you do not see this approach taken often. The reason it is difficult in random logic is because it is very difficult (basically impossible) to build a CAM and so you end up using some more elaborate translation scheme (i.e. SUN's two level translation table) that makes translation time consuming. Most of the people who build very fast caches do them discretely and so they end up with the translate then cache dilemma described above. As VLSI technology evolves, building more complex structures will become possible allowing MMU and CACHE to be one and the same. So in summary: Yes virtual caches are the wave of the present but not (in my mind) the wave of the future. Robert Welland Opinions expressed are my own and not those of Commodore.
aglew@ccvaxa.UUCP (07/27/87)
...> Physical vs. virtual caches. Some of you may remember a discussion I started last year about systems where all processes would live in the same virtual address space. The bottom line was that UNIX fork() makes it highly desirable for processes to be able to duplicate their address space (although there are ways around it). This was prompted by the desire to avoid the consistency problems inherent in a virtual cache. Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew 1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms.arpa
ross@hpihoah.HP.COM (Ross LaFetra) (07/28/87)
By the time I read this, many people have responded on a great many different machines. Most of the problems/solutions I'll discuss here have been seen in pieces elsewhere. But I'll discribe them as they pertain to one machine, the Hewlett-Packard Precision Archicture, of which the HP9000/Series 840 is an example (that is out there today): First of all, for further reading, there is a great deal of information on the hardware, software, architecture, and operating system (UNIX System V compatable) in the Hewlett-Packard Journal over the last two to three years. The HP9000/840 (I'll just call it the 840 from now on) has what is known as virtual/physical caches (split I and D). What this means is that the cache is indexed virtually, but checked (by means on tags) physically. This has the advantage of using the cache and TLB (virtual to physical translation cache) in parallel, which was the purpose of the original basenote. It also has the advantage of keeping the cache tag small, because of the limited physical address space (32 bits on HPPA- HP Precision Architecture), rather than the large virtual address space (48 bits on the 840, up to 64 bits on HPPA). The problem on consistency with shared memory isn't much of a problem. Since the machine supports a large number of spaces (32 bit segments of virtual memory), you can assign each piece of shared memory its own virtual space. There are 65536 spaces on the 840, so you are not likely to run out soon. Thus, you can avoid the need to assign a two virtual addresses to the same physical address. This is prohibited on HPPA machine (by software convention). Multi-processor cache consistency (not implemented on the 840 since it is a uniprocessor) is not a problem either. Each processor can brodcast the virtual address as well as the physical address when accessing memory. In reality, it is even simpler: only the cache index needs to be broadcast. IO presents a bit of a problem, but it is solved from the software side. Since the OS knows what the IO is going to do, the OS manages the cache for the IO system. There are a couple of opcodes that do this. HPPA solves alot of the problems associated with purging the cache and TLB on process switches by use of its large address space. Each process is assigned its own space(s), and no purgess of either the cache or TLB are needed on process switches. Only when spaces are removed is this action needed. This allows the 840 to use very large caches (128kb) and large TLBs (4k entries). Thus no time is wasted invalidating and reloading the cache or TLB on process switches (invalidation is typically fast in other machines I believe, but the reloading happens by missing the cache and TLB. Please note that physical caches don't need to do this). Each process can use its own virtual address zero, because it is really a virtual offset of zero, where the virtual address is a space and an offset (each up to 32 bits, for a max of 64 bits in HPPA). I hope this clarifies some of the issues. I tried to gather them in one place. I'm a little weak on the OS side of things, as I'm just a cache designer. Ross La Fetra hpda!ross