fan@ucla-cs.UUCP (04/22/87)
-------------- This is my first posting, so if there is any mistake, please excuse. I am doing a project on MMU's, and from reading various uP data books, I have several questions: The Intel iAPX 286 has an on-chip MMU. The Motorola 68020 has an off-chip MMU (68851). What are the important deciding factors in designing a MMU on-chip or off-chip? Three I can think of: execution speed, chip space, and additional support. Execution Speed: In general, on-chip MMU is faster than off-chip MMU. Chip Space: Sometimes, there is not enough space for putting a MMU on-chip. Sometimes, a cache is implemented instead of a MMU. Additional Support: If the MMU is on-chip, then some additional instructions might be needed. If the MMU is off-chip, then additional pins might be needed. It seems that the trend is putting the MMU on-chip. 68020 has no on-chip MMU, but 68030 has a subset of MMU. iAPX 386 has MMU on chip, and so is the National Semiconductor 32532 (I haven't read the data books yet, so I might be mistaken). Fairchild Clipper has an off-chip MMU. Question 1 : are there any other factors that might affect the design of the MMU being on-chip or off-chip? Question 2 : if there is enough space on the chip, would everybody put the MMU on-chip? Question 3 : if there is only enough room for either a cache or a MMU, which one will prevail? Roy Fan fan@cs.ucla.edu
tim@ism780c.UUCP (Tim Smith) (04/23/87)
In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: > Question 1 : are there any other factors that might affect the >design of the MMU being on-chip or off-chip? I read a claim somewhere that said that it is better to use the on-chip space for floating point, and make the MMU external, rather than have an on-chip MMU and an off-chip floating point unit. The argument was that you have to go off chip anyway to access memory, so it should be possible to make an external MMU as efficient as an internal one, whereas external floating point will be much slower than internal floating point. I have no idea if this is right or not. I prefer an internal MMU because then the system designer can't leave it out! I can deal with missing floating point in software. I can't deal with a missing MMU in software. -- Tim Smith "Hojotoho! Hojotoho! uucp: sdcrdcf!ism780c!tim Heiaha! Heiaha! Delph or GEnie: Mnementh Hojotoho! Heiaha!" Compuserve: 72257,3706
rich@motsj1.UUCP (Rich Goss) (04/24/87)
In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: >-------------- > > This is my first posting, so if there is any mistake, please >excuse. > > I am doing a project on MMU's, and from reading various uP >data books, I have several questions: > > The Intel iAPX 286 has an on-chip MMU. > The Motorola 68020 has an off-chip MMU (68851). > What are the important deciding factors in designing a MMU >on-chip or off-chip? > > Three I can think of: execution speed, chip space, and >additional support. > > Execution Speed: In general, on-chip MMU is faster than >off-chip MMU. > > Chip Space: Sometimes, there is not enough space for putting >a MMU on-chip. Sometimes, a cache is implemented instead of a MMU. > > Additional Support: If the MMU is on-chip, then some >additional instructions might be needed. If the MMU is off-chip, then >additional pins might be needed. > > It seems that the trend is putting the MMU on-chip. 68020 has >no on-chip MMU, but 68030 has a subset of MMU. iAPX 386 has MMU on >chip, and so is the National Semiconductor 32532 (I haven't read the >data books yet, so I might be mistaken). Fairchild Clipper has an >off-chip MMU. > > Question 1 : are there any other factors that might affect the >design of the MMU being on-chip or off-chip? > > Question 2 : if there is enough space on the chip, would >everybody put the MMU on-chip? > > Question 3 : if there is only enough room for either a cache >or a MMU, which one will prevail? > >Roy Fan fan@cs.ucla.edu Other factors to be considered in the choice of any MMU scheme: One should look at the architecure of the MMU i.e., segmented only, demand paged only, or a combination thereof. One should look at the amount of overhead needed to support the MMU i.e., the number of translation tables needed, the amount of memory space required to support the table descriptors, the control bits provided, i.e., access, modify, cache inhibit, etc. flags. Intel does not provide a cache inhibit bit making it tough to design an external cache where certain pages should not be cached (e.g., shared memory between two processors in some multiprocessing schemes). One should look at the potential for a particular MMU scheme being incorporated in future generation of processors. You should look at the 286 MMU and the 386 MMU. They are not compatible. The MMU in the 68030 is compatible with the 68851 PMMU chip. As to whether an MMU should be on chip with the CPU depends on the application. The 68030 is an excellent choice for work staions and systems which support multitasking and/or multiuser operating systems. The 68020 is a good choice for the above when coupled with the 68851 PMMU or the users own MMU. Also the 68020 is an excellent choice for embedded controller applications (e.g., disk, serial communications, LAN, etc.) which do not usually require an MMU. THe 68020 and 68030 are object code compatible which makes the software engineers happy. Also, the cost of the 68020 will be coming down to the point where it will make a lot of sense to use it in embedded controller applications. -- Rich Goss Motorola Western Regional Field Applications Engineer for 68000 Family
rajiv@im4u.UUCP (04/24/87)
Summary:More suggestions and views on this issue. In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: >-------------- > I am doing a project on MMU's, and from reading various uP >data books, I have several questions: > > It seems that the trend is putting the MMU on-chip. 68020 has >no on-chip MMU, but 68030 has a subset of MMU. iAPX 386 has MMU on >chip, and so is the National Semiconductor 32532 (I haven't read the >data books yet, so I might be mistaken). Fairchild Clipper has an >off-chip MMU. > > Question 1 : are there any other factors that might affect the >design of the MMU being on-chip or off-chip? > > Question 2 : if there is enough space on the chip, would >everybody put the MMU on-chip? > > Question 3 : if there is only enough room for either a cache >or a MMU, which one will prevail? > I seem to agree about the trend for putting the MMU on-chip and I feel that chip area would be a governing factor to the decision for placing the MMU on the chip. There may be an issue raised regarding the rigidity of translation mechanisms faced by on-chip MMU's if variable paging schemes(elaborate control) are not available in this hardware.I am not very familiar with this issue but feel that it might play a role in the decision for on-chip or off-chip MMU. Ofcourse, still it is very attractive (speed gains) to place the MMU on-chip. This would be the case for full system design products, like the IBM PC RT for instance, where the translation and page sizes are decided by the same company that designs the MMU. As regarding the choice between an on-chip cache and an on-chip MMU is concerned I would choose to place both these on the chip but as the area is limited I would make a compromise by placing ONLY the TLB of the MMU on-chip (translation and page tables of-chip) and a smaller cache in the remaining real estate. The reason for this is that the TLB if of reasonable size would give around 60% hits and a small cache would also give atleast 60-70% hits, this way we get the best of both the cache and MMU operations. The above choice is very dependant on the speed up one can obtain between on-chip and of-chip accesses. But I feel a compromizing decision would work better than going one way. Well that's some of my views ,I do not have some magic numbers to support them but would be happy to receive comments and criticism regarding them. Rajiv. ARPA: rajiv@im4u.utexas.edu UUCP: {ihnp4,seismo,allegra,ucbvax}!ut-sally!im4u!rajiv
dan@prairie.UUCP (Daniel M. Frank) (04/25/87)
In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: > What are the important deciding factors in designing a MMU >on-chip or off-chip? > > Question 1 : are there any other factors that might affect the >design of the MMU being on-chip or off-chip? Your list is pretty complete, but I'd like to broaden the discussion a bit. We can break the speed/pins issue down into two areas: one is the raw technological problem of pins and propagation delays, the other is architectural. Short of new technologies such as optical computers and brute-force methods such as ECL and freon cooling, the only way to speed up a uni- processor is to increase parallelism. This can be done by pipelining, which allows multiple instructions to be in various stages of execution simultaneously, and by increasing the parallism at each stage of the pipeline. Two ways to achieve the latter are to do operand validity checking in parallel with other operations, and to do it as early as possible. The advantage of doing it in parallel should be clear. The advantage of doing it early is that we can avoid having to throw many instructions out of the pipeline. Anyway, the more parallism you want, the more integrated your memory management hardware has to be with the CPU. If you put it off-chip, you'll need more pins, and the propagation delays may slow down your overall cycle time. The architectural issue is more subtle. Can we tailor our architecture in such a way that we can either inform the chip early about our addressing intentions, or break such information up so that there is less work to do at critical times? I claim that the 80x86 series does just that (whether well or badly) by checking segment validity at segment register load time, leaving only boundary and page presence checking for the execution of actual references. This is probably less interesting on the 80386, where segment register loads are bound to be much less frequent than on its predecessor. > Question 2 : if there is enough space on the chip, would >everybody put the MMU on-chip? I suppose so. If there was enough space (and they could cool it!), they'd try to put EVERYTHING on the chip. > Question 3 : if there is only enough room for either a cache >or a MMU, which one will prevail? My knee-jerk response is: it is so hard to really integrate an external MMU with a pipelined processor, that you'll win by putting the MMU and a small cache on chip, and putting a larger cache off-chip. I hear the two level cache worked out pretty well in the Microvax. [The preceding was an "architectural discussion". In some circles, this is also known as a "religious discussion". Consider yourself warned.] -- Dan Frank (w9nk) ARPA: dan@db.wisc.edu ATT: (608) 255-0002 (home) UUCP: ... uwvax!prairie!dan (608) 262-4196 (office) SNAILMAIL: 1802 Keyes Ave. Madison, WI 53711-2006
mash@mips.UUCP (John Mashey) (04/25/87)
In article <6047@ism780c.UUCP> tim@ism780c.UUCP (Tim Smith) writes: >I read a claim somewhere that said that it is better to use the on-chip >space for floating point, and make the MMU external, rather than have an >on-chip MMU and an off-chip floating point unit. > >The argument was that you have to go off chip anyway to access memory, >so it should be possible to make an external MMU as efficient as an >internal one, whereas external floating point will be much slower than >internal floating point. 1. There is no right answer, as usual. It depends on what your priorities are, how much silicon you have, all of the other architectural tradeoffs you've made, etc, etc, what technology trends you're expecting to track, etc. 2. There are a couple statements that one can make: At the current typical state of microprocessor technology [say, 1.2 - 2.0 micron] a) If you have on-chip FP, it won't be fast [remember, we think fast is a 2-cycle DP add or 5-cycle multiply, not 30 or 50-cycles]. A serious micro FPU can be bigger than the CPU chip. [Ours certainly is!] If we had more die area, our chippers would go knock out some more cycles, not cram it on the CPU. b) If you don't do on on-chip MMU: 0) You can build MMUs from fast SRAMs OR 1) You can have an off-chip MMU that sooner or later adds wait states as your systems get faster. [it's been interesting to see how performance of the same 16MHz 68K has varied according to what's around it]. OR 2) You will need special integrated cache-MMU parts. OR 3) You will go with virtual caches, sooner or later. (These are not necessarily bad, but there are interesting consequences for system design and OS's for some of them.) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
davids@well.UUCP (David Schachter) (04/25/87)
Rich Goss of Motorola states the '286 and '386 MMUs are not compatible. This is incorrect. The '386 MMU is a superset of the '286 MMU. (In fact, it's much better than the '286 MMU.) Any code running on the '286 will run on the '386 (unless you used the Intel-reserved fields in the '286 MMU descriptors!)
lawrenc@nvanbc.UUCP (Lawrence Harris) (04/26/87)
**** FLAME ON **** In article <122@motsj1.UUCP> rich@motsj1.UUCP (Rich Goss) writes: >In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: >>-------------- <<Edited out.>> The above is all OK but, the rest of this appears to be pure propaganda which I have the feeling is normal policy for motorola as most of the advertizing seems to run down competitors products by quoting misleading information such as what follows. Besides this is the ncs 32k group and not a place to advertize motorla cpu's. I was involved in a project using the Zilog Z8000 processor a few years back and recieved literature from motorola at that time makeing fantastic claims for their chip and virtually lieing about the limitations of the Z8000 (just for example they claimed the Z8000 could only address 64k bytes when in fact it was 8meg not including split I/D possiblities). >One should look at the potential for a particular MMU scheme being >incorporated in future generation of processors. You should look >at the 286 MMU and the 386 MMU. They are not compatible. The MMU >in the 68030 is compatible with the 68851 PMMU chip. > As far as I am aware (without doing much research) it is a subset, not a superset (ie. downward compatable). Further the Compaq 386 runs XENIX 286 from the box so how uncompatable can the 386 MMU be? Agreed it has more features than the 286 but you don't have to recode applications before you can use the 386! >As to whether an MMU should be on chip with the CPU depends on >the application. The 68030 is an excellent choice for work >staions and systems which support multitasking and/or multiuser >operating systems. The 68020 is a good choice for the above >when coupled with the 68851 PMMU or the users own MMU. Also the >68020 is an excellent choice for embedded controller applications Some more edited out here. > >-- Rich Goss >Motorola Western Regional Field Applications Engineer for 68000 >Family The Z8000 was a fantastic chip for embedded controller applications capable of handling the interupt rate from a floppy disc controller for example without DMA support. I guess the fact that he works for motorola gives him an interest in promoting motorola's products, but I do get tired of how far they stretch the truth sometimes. ps. The above should not be taken to indicate a preference for any of motorola or intel products. I actually prefer the national cpus myself (but use both intel and motorola). -- ------------------------------------------------------------------------------ UUCP: tectronix!uw-beaver!ubc-vision!van-bc!nvanbc!lawrence SNAIL: 733 Sylvan Ave., North Vancouver, B.C., Canada, V7R 2E8 PHONE: 1-604-736-9241 (09:00-17:00 PDT)
mcvoy@crys.WISC.EDU (Larry McVoy) (04/26/87)
In article <441@prairie.UUCP> dan@prairie.UUCP (Daniel M. Frank) writes:
(I should note that I'm not really qualified to talk about this, I'm mostly
software. But then, so is Dan...)
# Short of new technologies such as optical computers and brute-force
#methods such as ECL and freon cooling, the only way to speed up a uni-
#processor is to increase parallelism. This can be done by pipelining,
Whoa there. The ONLY way? I beg to differ. Think about caches for a
moment. Most are small, direct mapped, and flushed at context switches.
How much performance would be gained by making them larger, separate I&D,
full associative, contain process id's, etc.
And this bit about optics? Optics? What will that buy you? Sure light
travels fast but converting from electrons to photons is a drag.
And don't pooh-pooh ECL either. There seems to be a steady supply of
new technologies (read about super conductors?).
OK, now lets back off a bit. I'm not disagreeing with the statement about
parallelism - to a certain extent I agree that a lot can be gained that
way. But don't dismiss everything else with one grandiose sweep and expect
me to buy it. It's not anywhere close to as simple as you make it sound.
# Anyway, the more parallism you want, the more integrated your memory
#management hardware has to be with the CPU. If you put it off-chip,
#you'll need more pins, and the propagation delays may slow down your
#overall cycle time.
Not necessarily. Again, remember the value of a nice big smart cache.
#at critical times? I claim that the 80x86 series does just that (whether
[Flame++ ]
The 80x86 series is a load of upwardly compatible garbage and the whole
world knows it. I doubt there's a CS or ECE person anywhere that can
honestly say they like this architecture. If they do, they should go
back to school. Go look at the instruction sets before you flame me.
The word "orthogonal" does not exist in the Intel vocabulary. The word
"hack" does.
[Flame-- ]
#> Question 2 : if there is enough space on the chip, would
#>everybody put the MMU on-chip?
#
# I suppose so. If there was enough space (and they could cool it!),
#they'd try to put EVERYTHING on the chip.
Also not quite true. Large chips are a drag. They're a drag to lay out,
a drag to manufacture, a drag to cool (as you noted), etc. What you
really want to do is put everything that's in the "main loop" on chip,
leave everything else off chip. For example: suppose you look at a Vax
and realize that the good old polynomial evaluator instruction isn't
used very much. Why not use that chip space for cache and do the poly
func in software or in a slave processor? Take the 801 philosophy
of having to justify every instruction/feature, whatever on the chip.
#> Question 3 : if there is only enough room for either a cache
#>or a MMU, which one will prevail?
#
# My knee-jerk response is: it is so hard to really integrate an external
#MMU with a pipelined processor, that you'll win by putting the MMU and a
#small cache on chip, and putting a larger cache off-chip. I hear the two
#level cache worked out pretty well in the Microvax.
It did? I guess you never had more than 1 active job on your uVax, huh?
Compare a uVax with 3 users to a 680xx with 3 users. I know where I want
to work.
---------
I guess that I really want to say this: Don't throw out flip answers to
hard problems. I *know* that I have a lot to learn in this area and I
try to be especially careful because of it. If you are going to advocate
parallelism, don't do so without noting the difficulty of writing parallel
code. If you're going to dismiss advances in technology, don't do so
without proposing something better.
--
Larry McVoy mcvoy@rsch.wisc.edu or uwvax!mcvoy
"What a wonderful world it is that has girls in it!"
authorplaceholder@gorgo.UUCP.UUCP (04/27/87)
This is (of course) a very fuzzy question. I would tend to go (for now) with an off-chip mmu for several reasons: 1) On-board MMUs require micro-bus cycles just like separate MMUs, and depending upon the uP architecture may take the same number of cycles. 2) We can't add virtual cache with an on-board MMU. The advantage of virtual over physical cache is that it operates in parallel with the MMU cycles and returns in nearly half the time on a hit, whereas the physical cache always requires a complete MMU cycle. 3) Some applications (small signal processing, etc.) don't really require the MMU, so why should one drive up the cost of the uP by adding one on-board? 4) Some architectures support stackable multiple MMUs that operate parallel. One obviously cannot do this if the MMU is on-board. I am sure that there are numerous other reasons why off-chip MMUs are more desirable. Steve Blasingame (Oklahoma City) ihnp4!gorgo!bsteve
mash@mips.UUCP (John Mashey) (04/27/87)
In article <441@prairie.UUCP> dan@prairie.UUCP (Daniel M. Frank) writes: >In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: .... >> What are the important deciding factors in designing a MMU >>on-chip or off-chip? .... >> Question 2 : if there is enough space on the chip, would >>everybody put the MMU on-chip? > > I suppose so. If there was enough space (and they could cool it!), >they'd try to put EVERYTHING on the chip. > >> Question 3 : if there is only enough room for either a cache >>or a MMU, which one will prevail? > > My knee-jerk response is: it is so hard to really integrate an external >MMU with a pipelined processor, that you'll win by putting the MMU and a >small cache on chip, and putting a larger cache off-chip. I hear the two >level cache worked out pretty well in the Microvax. As usual, it really depends on what you think you're building. Almost any cache of any reasonable size is better than having no cache at all, if you're careful. For cache verus MMU: 1) If you're building controller chips that can easily get along without an MMU, then you might as well put a small I-cache on board, if nothing else. Even small I-caches work. [68020] 2) If you're building a "system" chip, then you're awfully tempted to put the MMU on-chip, assuming there's enough space to make it "big enough" to get adequate hit rates for the applications you intend to run. Note that the hit rates of small TLBs are much better than those of small caches. 2a) To minimize baord space, you might put the MMU on-chip, and build special cache-rams, or else build special MMU/cache chips [Clipper; 78000?] In this case, the tradeoff is to accept a cap on performance if you can't beef up the caches when you want to. 2b) To go for all-out performance, at some cost in board space, put the MMU on-chip, use ordinary SRAMs, and also put the cache control on-chip [MIPS R2000]. Here, the tradeoff is that the minimum board space is slightly larger than 1) or 2A), but you have more high-end left by adding more SRAMs. It is clear that on-chip caches, by themselves, simply CANNOT be made large enough in current technologies to get into the higher performance regions. [The following is a generalization. All generalizations are false:] ASSUMING YOU WANT TO RUN SUBSTANTIVE PROGRAMS, AND NOT CROAK RUNNING MULTI-USER UNIX: [all this for integer operations; FP has different ratios]: 2-3 Mips [VAX 11/780, 4.3BSD == 1] seems to be about the limit for systems with 1K or less cache. Examples: good 16.7MHz 68020 = 2 Mips [Sun3/160] : good 20MHz 68030 = 3 Mips [wild guess; I really don't understand how the data cache is really going to behave] : cacheless 386 = 2 Mips Possible exception: newest IBM RT PC, which might be 3-4Mips 3-5 Mips: Examples: : good 25MHz 68020 [with 64K cache, 64-bit buses, in Sun3/260] = 4 Mips : good 386 with 64K cache = 4 Mips [?] : Clipper with 4K+4K caches = ? Mips [can't tell from the published numbers] guess about 3-3.5 if kernel-intensive ones included : good 68030, 25MHz [whenever, not announced at this speed] guess = 5Mips by comparing with 4Mips, no-wait-state Sun3/260 : R2000 with 24K cache = 5 Mips; with only 16K, was more like 4 Mips I.e., some performance levels REQUIRE external caches. A whimsical way to put it is that a fast CPU is just a way for fast SRAMs to reach self-actualization, i.e., max performance. :-) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
clif@intelca.UUCP (Clif Purkiser) (04/27/87)
> In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: > >-------------- > > > > This is my first posting, so if there is any mistake, please > >excuse. > > > > I am doing a project on MMU's, and from reading various uP > >data books, I have several questions: > > > > The Intel iAPX 286 has an on-chip MMU. > > The Motorola 68020 has an off-chip MMU (68851). > > What are the important deciding factors in designing a MMU > >on-chip or off-chip? > > > > > >Roy Fan fan@cs.ucla.edu (Rich's comments start here) > Other factors to be considered in the choice of any MMU scheme: > > One should look at the architecure of the MMU i.e., segmented > only, demand paged only, or a combination thereof. (deleted material) > > One should look at the potential for a particular MMU scheme being > incorporated in future generation of processors. You should look > at the 286 MMU and the 386 MMU. They are not compatible. The MMU > in the 68030 is compatible with the 68851 PMMU chip. Please explain how the 386 MMU is not upwardly compatible with the 80286 MMU. If your statement is correct than it should have taken more than the few man days that it took to get Xenix 286 and RMX 286 (Both OSs used the 286 Protected Mode) working on the very first stepping of the 80386. The 386 MMU is compatible with the 286's MMU. After reading some 68030 articles I was under the impression that the 030 implemented a subset of the 68851, perhaps I am wrong. More importantly, the 68030 MMU is totally incompatible with the MMU architect of some of your bigger customers Sun, Apollo and almost all of the other 68K Unix manufacturers who couldn't wait for Mot to get the of-chip MMU working correctly. Or they didn't want to pay the performance penalty, added wait-states, associated with an off-chip MMU. > ( deleted Additional ramblings about 68xxx ) > -- Rich Goss > Motorola Western Regional Field Applications Engineer for 68000 > Family Responding to Mr Fan's original question. I believe a very important consideration is that an on-chip MMU allows binary compatibility between machines. For instance on the Unix System for the 80386, several ISVs have been shocked to discovered that same binary disk of their application (like a database) would work on 386 systems with radically different hardware. For example 386 Unix machines use a wide variety of buses MultiBus I, MultiBus II, PC AT bus, and proprietary buses. But because the 386 has an on-chip MMU and a well defined file format (COFF) binary compatible has been achieved between different 386-base Unix computers. Contrast this with Unix systems that use processors with off-chip MMUs. If I want to buy an application for a 68020 machine I have to specify which machine I am using Apollo, Masscomp etc. Eventhough all of the machines are running Unix-like OS and use the 68020. The result is that Unix ISVs spend most of their time (and make most of their money) porting applications to different machines, instead of developing new applications or improving the existing ones. -- Clif Purkiser, Intel, Santa Clara, Ca. {pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif These views are my own property. However anyone who wants them can have them for a nominal fee.
grenley@nsc.UUCP (04/28/87)
In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: > What are the important deciding factors in designing a MMU >on-chip or off-chip? > Three I can think of: execution speed, chip space, and >additional support. > Execution Speed: In general, on-chip MMU is faster than >off-chip MMU. > Chip Space: Sometimes, there is not enough space for putting >a MMU on-chip. Sometimes, a cache is implemented instead of a MMU. > Additional Support: If the MMU is on-chip, then some >additional instructions might be needed. If the MMU is off-chip, then >additional pins might be needed. Don't forget cost. If the MMU must be implemented as a separate chip or chips, it is expensive. On the other hand, the incremental cost of silicon, even in 386/68030/32532 class processors, is small in comparison to overall system cost. For example, our yield on 532s does not increase dramatically if the MMU is deleted. > Question 1 : are there any other factors that might affect the >design of the MMU being on-chip or off-chip? Biggest factor is: Do you need an MMU? If the processor is targeted at general computing applications (read Unix) then MMU is req'd, and it should be on chip. If CPU is for control and other embedded applications, skip the MMU. > Question 2 : if there is enough space on the chip, would >everybody put the MMU on-chip? Well, we did. So did Intel. Anybody at Mot care to comment? > Question 3 : if there is only enough room for either a cache >or a MMU, which one will prevail? I would guess cache. CPU memory requirements are outrunning the ability of DRAM to keep up. Disclaimer: I work for NSC, designing systems based on the '532. On the other hand, I used to work for Intel, selling 286s. The only computer I spent my own money on is a Macintosh. YOU figure out where my biases are. Regards, George Grenley
mhorne@tekfdi.TEK.COM (Mike Horne) (04/28/87)
In article <2581@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes: >> (deleted stuff...) >> -- Rich Goss >> Motorola Western Regional Field Applications Engineer for 68000 >> Family > > (deleted stuff...) >Clif Purkiser, Intel, Santa Clara, Ca. >{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif Rich, let's not stretch the truth. Clif, let's be a man enough to not pick a fight. The man wanted info about MMUs, not slanderous dis-information. Can we please get on with a decent discussion? This junk has spanned 4 newsgroups now! -MTH KA7AXD -- --------------------------------------------------------------------------- Michael Horne - KA7AXD UUCP: tektronix!tekfdi!honda!mhorne FDI group, Tektronix, Incorporated ARPA: mhorne@honda.fdi.tek.com Day: (503) 627-1666 HAMNET: ka7axd@k7ifg
mark@applix.UUCP (Mark Fox) (04/28/87)
In article <2581@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes: >... I believe a very important >consideration is that an on-chip MMU allows binary compatibility >between machines. For instance on the Unix System for the 80386, blah, blah, blah... >Contrast this with Unix systems that use processors with off-chip MMUs. >If I want to buy an application for a 68020 machine I have to specify which >machine I am using Apollo, Masscomp etc. Eventhough all of the machines are >running Unix-like OS and use the 68020... > >Clif Purkiser, Intel, Santa Clara, Ca. Whoa there Clif! Turn off the flames. Since when does the on-chip MMU or lack thereof matter to Unix ISVs (read that Unix application vendors)?? The reason we have vendor-specific code is primarily because Apollo, Masscomp etc. have different software environments (Aegis vs RTU). The fact that their MMUs are different affects only the hardware vendors themselves and then only their folks who develop and maintain the operating systems. The people I work with don't care if the MMU is on the chip or not, only whether one system out-performs another and whether or not the ugliness of the architecture gets in our way. :-) Granted, it was nice to see our application running on an early pre-announced Compaq 386 -- the same binary that ran on an IBM PC/AT -- but why is that any more remarkable than seeing our application running on both a Sun 2 (68010) and a Sun 3 (68020) without being recompiled or relinked? The reason for the compatibility between the 386 and 286 is that the operating system (ie software environment) was identical, not because the MMU was on-chip. When we port to a real System V for the 386, all bets are off regarding binary portability between that and Xenix! -- Mark Fox Applix Inc., 112 Turnpike Road, Westboro, MA 01581, (617) 870-0300 uucp: seismo!harvard!halleys!applix!mark
lamaster@pioneer.arpa (Hugh LaMaster) (04/28/87)
In article <4244@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes: >In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: >> What are the important deciding factors in designing a MMU >>on-chip or off-chip? > >> Question 3 : if there is only enough room for either a cache >>or a MMU, which one will prevail? > >I would guess cache. CPU memory requirements are outrunning the ability >of DRAM to keep up. > > >George Grenley I would like to put in a plug for an on-chip MMU, an on chip (relatively small) instruction cache (works with virtual addresses in the current context only), and an off chip data cache (may not be necessary with enough registers). The reasons: 1) It is nice when people build compatible architecture systems from your chip, as they are more likely to do with an on chip MMU, because it is easier to port software; 2) The kinds of problems that I work with benefit a lot more from an instruction cache than a data cache (Cray and/or Control Data have been building very high performance machines for years with an instruction cache only, and no data cache); 3) You can't put a large cache on a chip anyway; 4) Data caches on chip will complicate multiple processor implementations; 5) If there is more room on the chip after the MMU is on, the next step is to put the ARITHMETIC back on the chip (no extra FPA necessary then), and the next step after that is to divide the ALU into segmented functional units, then add vector instructions with fully segmented functional units, and to make sure you have enough registers, with everything in "RISC style" (no microcode, lots of random logic); 6) Then, after all that is on the chip, and you still have room ( :-) ) put the data cache back on the chip. Hugh LaMaster, m/s 233-9, UUCP {seismo,topaz,lll-crg,ucbvax}! NASA Ames Research Center ames!pioneer!lamaster Moffett Field, CA 94035 ARPA lamaster@ames-pioneer.arpa Phone: (415)694-6117 ARPA lamaster@pioneer.arc.nasa.gov "In order to promise genuine progress, the acronym RISC should stand for REGULAR (not reduced) instruction set computer." - Wirth ("Any opinions expressed herein are solely the responsibility of the author and do not represent the opinions of NASA or the U.S. Government")
fan@CS.UCLA.EDU (04/28/87)
In article <58500002@gorgo.UUCP> bsteve@gorgo.UUCP writes: > 1) On-board MMUs require micro-bus cycles just like separate MMUs, > and depending upon the uP architecture may take the same number > of cycles. Don't the current uP use pipelining to increase the throughput, for example 80386, 32532? Thus there could be no micro-bus cycles required. > 2) We can't add virtual cache with an on-board MMU. The advantage of > virtual over physical cache is that it operates in parallel with > the MMU cycles and returns in nearly half the time on a hit, > whereas the physical cache always requires a complete MMU cycle. We could put a small cache on the chip (if there is enough room). Of course, if there isn't enough room, we will need to go off-chip. > 3) Some applications (small signal processing, etc.) don't really > require the MMU, so why should one drive up the cost of the uP > by adding one on-board? Yes, you're right. > 4) Some architectures support stackable multiple MMUs that operate > parallel. One obviously cannot do this if the MMU is on-board. If the MMU is off-chip, usually it should be large enough to ensure some high hit ratio. Then by increasing the number of MMU wouldn't increase the performance much. However, it would increase the board space. I guess for performance-wise, on-chip MMU is faster than off-chip MMU. But then there are just too many factors to be considered. Roy Fan University of California fan@cs.ucla.edu Los Angeles
paul@unisoft.UUCP (04/28/87)
In article <2581@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes: > >After reading some 68030 articles I was under the impression that >the 030 implemented a subset of the 68851, perhaps I am wrong. ..... > >-- >Clif Purkiser, Intel, Santa Clara, Ca. >{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif > While it is true that the PMU is a superset of the 68030 this is in practice not a problem, for Unix kernels anyway, it turns out that the stuff in the PMMU that is not in the 68030 is not generally used by Unix kernels (the accent here is on UNIX kernels, other [ring protection domain based kernels for example] might use them, or sophisticated debuggers). One of the reasons for this is that early on everyone had to have MMB compatability (there were no PMMUs) and all those extra goodies weren't there. In our PMMU/MMB kernels when we did the MMU code we had no 68030 docs to guide us (just rumors, mostly untrue) and the result was something that will port to a 68030 with less than a week's work. I think that the 68030 being a subset of the PMMU is a non issue, for Unix anyway. "68030, what me worry?" Paul Campbell UniSoft Systems ..!ucbvax!unisoft!paul
henry@utzoo.UUCP (Henry Spencer) (04/28/87)
> Contrast this with Unix systems that use processors with off-chip MMUs. > If I want to buy an application for a 68020 machine I have to specify which > machine I am using Apollo, Masscomp etc. Eventhough all of the machines are > running Unix-like OS and use the 68020... C'mon now, Clif, be fair: the MMU has virtually nothing to do with this. The divergences between the various 68020 boxes are in things like object file formats, system-call conventions, and graphics facilities. You may have done a better job on standardizing file formats and call conventions (although in a couple of years, after the 386 is used more widely, you may have cause to eat those words), but simply recompiling cures those ills. The real "porting" problems are things like different graphics hardware and System V vs. Berklix -- hardly the fault of the MMU. Actually, in a different sense they are: the ugliness of the addressing model on your previous processors is the reason why you don't have to worry about these things (yet!), because all the serious divergence took place on better machines! Unless you're really clamping down hard on 386 developers, the same thing will happen to you before too very long. Welcome to the 32-bit world; hope you like it. :-) -- "If you want PL/I, you know Henry Spencer @ U of Toronto Zoology where to find it." -- DMR {allegra,ihnp4,decvax,pyramid}!utzoo!henry
mash@mips.UUCP (04/29/87)
In article <58500002@gorgo.UUCP> bsteve@gorgo.UUCP writes: >This is (of course) a very fuzzy question. I would tend to go (for now) with >an off-chip mmu for several reasons: > 1) On-board MMUs require micro-bus cycles just like separate MMUs, > and depending upon the uP architecture may take the same number > of cycles. Or may not; on-chip is often easier to overlap. > 2) We can't add virtual cache with an on-board MMU. The advantage of > virtual over physical cache is that it operates in parallel with > the MMU cycles and returns in nearly half the time on a hit, > whereas the physical cache always requires a complete MMU cycle. This is 100% not true if the chip has the cache control on it. Recall also that there can be substantial hit-rate losses from virtual caches, as well as serious complexifications [nothing that can't be beaten, but some things get more complex in both hardware and software. There's an interesting paper on the VM for the SUn3/260 in the next USENIX.] > 3) Some applications (small signal processing, etc.) don't really > require the MMU, so why should one drive up the cost of the uP > by adding one on-board? .... This can be true; but only if the chip is intended for controller applications as a first priority, because it may cost $$ and cycle time to then add an MMU. >I am sure that there are numerous other reasons why off-chip MMUs are more >desirable. On the other hand, at least several vendors [such as Motorola] who used to have the MMU off-chip are now putting it on. Finally, our MIPS R2000s use on-chip TLBs, and they're not exactly slow: if you know a micro with a separate MMU that's faster, and that you can actually buy, please post some info. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.UUCP (04/29/87)
In article <1401@ames.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes: >In article <4244@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes: >>In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: >>> What are the important deciding factors in designing a MMU >>>on-chip or off-chip? >I would like to put in a plug for an on-chip MMU, an on chip (relatively >small) instruction cache (works with virtual addresses in the current >context only), and an off chip data cache (may not be necessary with >enough registers). The reasons: (bunch of reasons... msot of which are quite reasonable) >5) If there is more room on the chip after the MMU is on, the next step is to > put the ARITHMETIC back on the chip (no extra FPA necessary then), and the > next step after that is to divide the ALU into segmented functional units, > then add vector instructions with fully segmented functional units, and to > make sure you have enough registers, with everything in "RISC style" (no > microcode, lots of random logic); >6) Then, after all that is on the chip, and you still have room ( :-) ) > put the data cache back on the chip. Given Hugh's general wishes expressed elsewhere [including good FP], the only piece I'd disagree with here is 5), and we're probably not really disagreeing. For sometime, it will be very difficult to get a high-performance FP unit on the same chip as (even a RISC) cpu. Our FPU is bigger than our CPU, and if the chippers had dared make it bigger, they would have used the space to eliminate another cycle or two, rather than trying to cram the CPU and FPU together. For some time, you can either have low-medium FP performance on the CPU chip [a legitimate design point], or you can have high-performance FP off-chip. Note: I'm talking the differences between 2-cycle DP Adds versus the 30-50 cycles or more prevalent in coprocessors these days. The only other disagreement [and it only by omission], is that the argument started as cache versus MMU. Hugh extended it logically to include most of the other elements that might be there. There's one that's left out, and it's actually one of the highest priorities, and it really doesn't cost too much: put all the cache control for external caches onto the chip. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
jfh@killer.UUCP (John Haugh) (04/29/87)
Much has been argued in this group about whether or not Intel has a better MMU and how slow Motorola was in getting their MMU out. Custom MMU's are not particularly hard to build, and many companies have done so. Pinnacle Systems/Logic Process Co/Whoever they are next week uses a custom MMU built out of 45ns or 35ns (depending on which machine) static rams. The Pinnacle XL/MPulse 10 (sounds like a merger of 2 bank cards here in Dallas, MPact and Pulse :-) runs a 68K at 12MHz, 0 wait states, and according to the latest rumors the Pinnacle XL020/MPulse 20 runs a 68020 at 16MHz, 1 wait state. Supposedly John Bremsteller has been working on getting the 020 up to 25MHz with only 1 or 2 wait states. So, with a lack of 'on chip MMU' this little company has done about as good as you can get. I see no reason to argue about whether the MMU belongs on chip or off, so long as you don't take a performance hit for it. And yes, when someone said an IBM-PC/?? would beat my 6MHz 68000, I too laughed. I have never seen a 286 running more than 4 or 5 users at 8MHz, but I have seen 8 or 9 users on a 68000 at 6MHz, and it just gets better with more memory, faster clock speeds, faster disks, and all of the other new goodies that have come out in the 5 years I've had my machine. - John. (jfh@killer.UUCP) Disclaimer: No disclaimer. Whatcha gonna do, sue me?
henry@utzoo.UUCP (Henry Spencer) (05/07/87)
> Custom MMU's are not particularly hard to build, and many companies have > done so. Pinnacle Systems/Logic Process Co/Whoever they are next week uses > a custom MMU built out of 45ns or 35ns (depending on which machine) static > rams. The Pinnacle XL/MPulse 10 (sounds like a merger of 2 bank cards here > in Dallas, MPact and Pulse :-) runs a 68K at 12MHz, 0 wait states, and > according to the latest rumors the Pinnacle XL020/MPulse 20 runs a 68020 > at 16MHz, 1 wait state. Supposedly John Bremsteller has been working on > getting the 020 up to 25MHz with only 1 or 2 wait states. You will forgive us, I trust, for not being too impressed... The Sun-2, now obsolete, ran a 68K at 12MHz with no wait states. The early Sun-3 models, starting to look dated, run a 68020 at 16MHz with 1.5 wait states (how in @#$%@ do they get half a wait state?...). The Sun-3 200 series runs a 68020 at 25MHz with circa no wait states, out of fast virtual cache. Predictably, it's done with fast static RAMs. This is nothing new, the first Suns date back to circa 1980. -- "If you want PL/I, you know Henry Spencer @ U of Toronto Zoology where to find it." -- DMR {allegra,ihnp4,decvax,pyramid}!utzoo!henry
mitch@stride1.UUCP (Thomas P. Mitchell) (05/09/87)
In article <122@motsj1.UUCP> rich@motsj1.UUCP (Rich Goss) writes: >In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes: >>-------------- >> I am doing a project on MMU's, and from reading various uP >>data books, I have several questions: >> >> Question 1 : are there any other factors that might affect the >>design of the MMU being on-chip or off-chip? The answer is system design, what functions do you want your customers to put arround your processor and what will they do with them. Also what the programmers model looks like. I was lucky enough to hear a short talk by a gentleman at MIPS. He outlined some of their design goals. Memory translation takes time in fact a lot of time. Their reduced instruction set gave them enough silicon to build the kind of processor that they felt Unix needed. When the 8080 was born UNIX was a rare beast. So clearly Intel did not have the Unix community in their system design goals. You might also look at a patent by Sun on their MMU it gives a clue toward the problems with time when building an MMU. (We use 68010 and 68020s here). >> >> Question 2 : if there is enough space on the chip, would >>everybody put the MMU on-chip? If there was enough space everything would be on chip. Gate delay times are much shorter on chip than off chip. Low_Power + CPU + MMU + 100 GB RAM = ;-). Thomas P. Mitchell (mitch@stride1.Stride.COM) Phone: (702) 322-6868 TWX: 910-395-6073 MicroSage Computer Systems Inc. a Division of Stride Micro. Opinions expressed are probably mine.
adam@misoft.UUCP (05/15/87)
In article <319@crys.WISC.EDU> mcvoy@crys.WISC.EDU (Larry McVoy) writes: > >And this bit about optics? Optics? What will that buy you? Sure light >travels fast but converting from electrons to photons is a drag. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ So why bother? Have optical sensors on keyboards, direct light output on your terminal screen. All comms can easily be optical fibre. No need to convert optical disk information into electrical current. The only reason for converting between electrons & photons is interfacing to the old electron- driven computers. -Adam. /* If at first it don't compile, kludge, kludge again.*/
lm@cottage.UUCP (05/20/87)
I sez: >>And this bit about optics? Optics? What will that buy you? Sure light >>travels fast but converting from electrons to photons is a drag. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Adam Quantrill) sez: #So why bother? Have optical sensors on keyboards, direct light output on #your terminal screen. All comms can easily be optical fibre. No need to convert #optical disk information into electrical current. The only reason #for converting between electrons & photons is interfacing to the old electron- #driven computers. # -Adam. # #/* If at first it don't compile, kludge, kludge again.*/ Because, my good sir, optical gates don't work very well yet. And it looks like they may never work very well. In spite of what Sci America or any other "knowledgeable" rag says, optical gates are more than likely pie in the sky. And until they or something else comes along, electrons and computers will stick together. Now, optical busses are another question. It's awfully nice to have a 64 bit bus in this teensy tiny fibre running across your chip. Or acting as a backplane bus. But my original statement stands: conversion is a pain and you want to look very carefully at the tradeoffs before embracing optics. Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy