[comp.sys.m68k] Question: on-chip or off-chip MMU?

fan@ucla-cs.UUCP (04/22/87)

--------------

	This is my first posting, so if there is any mistake, please
excuse.

	I am doing a project on MMU's, and from reading various uP
data books, I have several questions:

	The Intel iAPX 286 has an on-chip  MMU.
	The Motorola 68020 has an off-chip MMU (68851).
	What are the important deciding factors in designing a MMU
on-chip or off-chip?

	Three I can think of:  execution speed, chip space, and
additional support.

	Execution Speed:  In general, on-chip MMU is faster than
off-chip MMU.

	Chip Space:  Sometimes, there is not enough space for putting
a MMU on-chip.  Sometimes, a cache is implemented instead of a MMU.

	Additional Support:  If the MMU is on-chip, then some
additional instructions might be needed.  If the MMU is off-chip, then
additional pins might be needed.

	It seems that the trend is putting the MMU on-chip.  68020 has
no on-chip MMU, but 68030 has a subset of MMU.  iAPX 386 has MMU on
chip, and so is the National Semiconductor 32532 (I haven't read the
data books yet, so I might be mistaken).  Fairchild Clipper has an
off-chip MMU.

	Question 1 :  are there any other factors that might affect the
design of the MMU being on-chip or off-chip?

	Question 2 :  if there is enough space on the chip, would
everybody put the MMU on-chip?

	Question 3 :  if there is only enough room for either a cache
or a MMU, which one will prevail?

Roy Fan  fan@cs.ucla.edu

tim@ism780c.UUCP (Tim Smith) (04/23/87)

In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>	Question 1 :  are there any other factors that might affect the
>design of the MMU being on-chip or off-chip?

I read a claim somewhere that said that it is better to use the on-chip
space for floating point, and make the MMU external, rather than have an
on-chip MMU and an off-chip floating point unit.

The argument was that you have to go off chip anyway to access memory,
so it should be possible to make an external MMU as efficient as an
internal one, whereas external floating point will be much slower than
internal floating point.

I have no idea if this is right or not.

I prefer an internal MMU because then the system designer can't leave
it out!  I can deal with missing floating point in software.  I can't
deal with a missing MMU in software.
-- 
Tim Smith			"Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim	 Heiaha! Heiaha!
Delph or GEnie: Mnementh	 Hojotoho! Heiaha!"
Compuserve: 72257,3706

rich@motsj1.UUCP (Rich Goss) (04/24/87)

In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>--------------
>
>	This is my first posting, so if there is any mistake, please
>excuse.
>
>	I am doing a project on MMU's, and from reading various uP
>data books, I have several questions:
>
>	The Intel iAPX 286 has an on-chip  MMU.
>	The Motorola 68020 has an off-chip MMU (68851).
>	What are the important deciding factors in designing a MMU
>on-chip or off-chip?
>
>	Three I can think of:  execution speed, chip space, and
>additional support.
>
>	Execution Speed:  In general, on-chip MMU is faster than
>off-chip MMU.
>
>	Chip Space:  Sometimes, there is not enough space for putting
>a MMU on-chip.  Sometimes, a cache is implemented instead of a MMU.
>
>	Additional Support:  If the MMU is on-chip, then some
>additional instructions might be needed.  If the MMU is off-chip, then
>additional pins might be needed.
>
>	It seems that the trend is putting the MMU on-chip.  68020 has
>no on-chip MMU, but 68030 has a subset of MMU.  iAPX 386 has MMU on
>chip, and so is the National Semiconductor 32532 (I haven't read the
>data books yet, so I might be mistaken).  Fairchild Clipper has an
>off-chip MMU.
>
>	Question 1 :  are there any other factors that might affect the
>design of the MMU being on-chip or off-chip?
>
>	Question 2 :  if there is enough space on the chip, would
>everybody put the MMU on-chip?
>
>	Question 3 :  if there is only enough room for either a cache
>or a MMU, which one will prevail?
>
>Roy Fan  fan@cs.ucla.edu

Other factors to be considered in the choice of any MMU scheme:

One should look at the architecure of the MMU i.e., segmented
only, demand paged only, or a combination thereof.

One should look at the amount of overhead needed to support the
MMU i.e., the number of translation tables needed, the amount of
memory space required to support the table descriptors, the
control bits provided, i.e., access, modify, cache inhibit, etc.
flags. Intel does not provide a cache inhibit bit making it tough
to design an external cache where certain pages should not be
cached (e.g., shared memory between two processors in some
multiprocessing schemes).

One should look at the potential for a particular MMU scheme being
incorporated in future generation of processors. You should look
at the 286 MMU and the 386 MMU. They are not compatible. The MMU
in the 68030 is compatible with the 68851 PMMU chip.

As to whether an MMU should be on chip with the CPU depends on
the application. The 68030 is an excellent choice for work
staions and systems which support multitasking and/or multiuser
operating systems. The 68020 is a good choice for the above
when coupled with the 68851 PMMU or the users own MMU. Also the
68020 is an excellent choice for embedded controller applications
(e.g., disk, serial communications, LAN, etc.) which do not
usually require an MMU. THe 68020 and 68030 are object code compatible
which makes the software engineers happy. Also, the cost of the
68020 will be coming down to the point where it will make a lot
of sense to use it in embedded controller applications.

-- Rich Goss
Motorola Western Regional Field Applications Engineer for 68000
Family

rajiv@im4u.UUCP (04/24/87)

Summary:More suggestions and views on this issue.

In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>--------------
>	I am doing a project on MMU's, and from reading various uP
>data books, I have several questions:
>
>	It seems that the trend is putting the MMU on-chip.  68020 has
>no on-chip MMU, but 68030 has a subset of MMU.  iAPX 386 has MMU on
>chip, and so is the National Semiconductor 32532 (I haven't read the
>data books yet, so I might be mistaken).  Fairchild Clipper has an
>off-chip MMU.
>
>	Question 1 :  are there any other factors that might affect the
>design of the MMU being on-chip or off-chip?
>
>	Question 2 :  if there is enough space on the chip, would
>everybody put the MMU on-chip?
>
>	Question 3 :  if there is only enough room for either a cache
>or a MMU, which one will prevail?
>

I seem to agree about the trend for putting the MMU on-chip and I feel that
chip area would be a governing factor to the decision for placing the MMU on
the chip. 

There may be an issue raised regarding the rigidity of translation mechanisms
faced by on-chip MMU's if variable paging schemes(elaborate control) are not
available in this hardware.I am not very familiar with this issue but feel that
it might play a role in the decision for on-chip or off-chip MMU.
Ofcourse, still it is very attractive (speed gains) to place the MMU on-chip.
This would be the case for full system design products, like the  IBM PC RT for
instance, where the translation and page sizes are decided by the same company
that designs the MMU.

As regarding the choice between an on-chip cache and an on-chip MMU is concerned
I would choose to place both these on the chip but as the area is limited I 
would make a compromise by placing ONLY the TLB of the MMU on-chip (translation
and page tables of-chip) and a smaller cache in the remaining real estate.
The reason for this is that the TLB if of reasonable size would give around 60%
hits and a small cache would also give atleast 60-70% hits, this way we get the
best of both the cache and MMU operations.
The above choice is very dependant on the speed up one can obtain between 
on-chip and of-chip accesses. But I feel a compromizing decision would work
better than going one way.

Well that's some of my views ,I do not have some magic numbers to support them
but would be happy to receive comments and criticism regarding them.

Rajiv.

ARPA: rajiv@im4u.utexas.edu
UUCP: {ihnp4,seismo,allegra,ucbvax}!ut-sally!im4u!rajiv

dan@prairie.UUCP (Daniel M. Frank) (04/25/87)

In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>	What are the important deciding factors in designing a MMU
>on-chip or off-chip?
>
>	Question 1 :  are there any other factors that might affect the
>design of the MMU being on-chip or off-chip?

   Your list is pretty complete, but I'd like to broaden the discussion
a bit.  We can break the speed/pins issue down into two areas:  one is
the raw technological problem of pins and propagation delays, the other
is architectural.

   Short of new technologies such as optical computers and brute-force
methods such as ECL and freon cooling, the only way to speed up a uni-
processor is to increase parallelism.  This can be done by pipelining,
which allows multiple instructions to be in various stages of execution
simultaneously, and by increasing the parallism at each stage of the
pipeline.  Two ways to achieve the latter are to do operand validity
checking in parallel with other operations, and to do it as early as 
possible.  The advantage of doing it in parallel should be clear.  The
advantage of doing it early is that we can avoid having to throw many
instructions out of the pipeline.

   Anyway, the more parallism you want, the more integrated your memory
management hardware has to be with the CPU.  If you put it off-chip,
you'll need more pins, and the propagation delays may slow down your
overall cycle time.

   The architectural issue is more subtle.  Can we tailor our architecture
in such a way that we can either inform the chip early about our addressing
intentions, or break such information up so that there is less work to do
at critical times?  I claim that the 80x86 series does just that (whether
well or badly) by checking segment validity at segment register load time,
leaving only boundary and page presence checking for the execution of actual
references.  This is probably less interesting on the 80386, where segment
register loads are bound to be much less frequent than on its predecessor.

>	Question 2 :  if there is enough space on the chip, would
>everybody put the MMU on-chip?

   I suppose so.  If there was enough space (and they could cool it!), 
they'd try to put EVERYTHING on the chip.

>	Question 3 :  if there is only enough room for either a cache
>or a MMU, which one will prevail?

   My knee-jerk response is:  it is so hard to really integrate an external
MMU with a pipelined processor, that you'll win by putting the MMU and a
small cache on chip, and putting a larger cache off-chip.  I hear the two
level cache worked out pretty well in the Microvax.

   [The preceding was an "architectural discussion".  In some circles, this
is also known as a "religious discussion".  Consider yourself warned.]

-- 
      Dan Frank (w9nk)
	ARPA: dan@db.wisc.edu			ATT: (608) 255-0002 (home)
	UUCP: ... uwvax!prairie!dan		     (608) 262-4196 (office)
	SNAILMAIL: 1802 Keyes Ave. Madison, WI 53711-2006

mash@mips.UUCP (John Mashey) (04/25/87)

In article <6047@ism780c.UUCP> tim@ism780c.UUCP (Tim Smith) writes:
>I read a claim somewhere that said that it is better to use the on-chip
>space for floating point, and make the MMU external, rather than have an
>on-chip MMU and an off-chip floating point unit.
>
>The argument was that you have to go off chip anyway to access memory,
>so it should be possible to make an external MMU as efficient as an
>internal one, whereas external floating point will be much slower than
>internal floating point.

1. There is no right answer, as usual.
It depends on what your priorities are, how much silicon you have,
all of the other architectural tradeoffs you've made, etc, etc,
what technology trends you're expecting to track, etc.

2.  There are a couple statements that one can make:
At the current typical state of microprocessor technology
[say, 1.2 - 2.0 micron]
	a) If you have on-chip FP, it won't be fast [remember, we think
	fast is a 2-cycle DP add or 5-cycle multiply, not 30 or 50-cycles].
	A serious micro FPU can be bigger than the CPU chip.
	[Ours certainly is!]  If we had more die area, our chippers
	would go knock out some more cycles, not cram it on the CPU.

	b) If you don't do on on-chip MMU:
		0) You can build MMUs from fast SRAMs
		OR
		1) You can have an off-chip MMU that sooner or later
		adds wait states as your systems get faster. [it's been
		interesting to see how performance of the same 16MHz 68K
		has varied according to what's around it].
		OR
		2) You will need special integrated cache-MMU parts.
		OR
		3) You will go with virtual caches, sooner or later.
	(These are not necessarily bad, but there are interesting consequences
	for system design and OS's for some of them.)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

davids@well.UUCP (David Schachter) (04/25/87)

Rich Goss of Motorola states the '286 and '386 MMUs are not compatible.  This
is incorrect.  The '386 MMU is a superset of the '286 MMU.  (In fact, it's
much better than the '286 MMU.)  Any code running on the '286 will run on the
'386 (unless you used the Intel-reserved fields in the '286 MMU descriptors!)

lawrenc@nvanbc.UUCP (Lawrence Harris) (04/26/87)

**** FLAME ON ****

In article <122@motsj1.UUCP> rich@motsj1.UUCP (Rich Goss) writes:
>In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>>--------------

<<Edited out.>>

The above is all OK but, the rest of this appears to be pure propaganda
which I have the feeling is normal policy for motorola as most of the
advertizing seems to run down competitors products by quoting misleading
information such as what follows.  Besides this is the ncs 32k group and
not a place to advertize motorla cpu's.

I was involved in a project using the Zilog Z8000 processor a few years
back and recieved literature from motorola at that time makeing fantastic
claims for their chip and virtually lieing about the limitations of the 
Z8000 (just for example they claimed the Z8000 could only address 64k bytes
when in fact it was 8meg not including split I/D possiblities).

>One should look at the potential for a particular MMU scheme being
>incorporated in future generation of processors. You should look
>at the 286 MMU and the 386 MMU. They are not compatible. The MMU
>in the 68030 is compatible with the 68851 PMMU chip.
>
As far as I am aware (without doing much research) it is a subset, not a
superset (ie. downward compatable).  Further the Compaq 386 runs XENIX 286
from the box so how uncompatable can the 386 MMU be?  Agreed it has more
features than the 286 but you don't have to recode applications before you
can use the 386!

>As to whether an MMU should be on chip with the CPU depends on
>the application. The 68030 is an excellent choice for work
>staions and systems which support multitasking and/or multiuser
>operating systems. The 68020 is a good choice for the above
>when coupled with the 68851 PMMU or the users own MMU. Also the
>68020 is an excellent choice for embedded controller applications

Some more edited out here.
>
>-- Rich Goss
>Motorola Western Regional Field Applications Engineer for 68000
>Family

The Z8000 was a fantastic chip for embedded controller applications capable
of handling the interupt rate from a floppy disc controller for example
without DMA support.

I guess the fact that he works for motorola gives him an interest in
promoting motorola's products, but I do get tired of how far they stretch
the truth sometimes.

ps. The above should not be taken to indicate a preference for any of
    motorola or intel products.  I actually prefer the national cpus
    myself (but use both intel and motorola).
-- 
------------------------------------------------------------------------------
UUCP:  tectronix!uw-beaver!ubc-vision!van-bc!nvanbc!lawrence
SNAIL: 733 Sylvan Ave., North Vancouver, B.C., Canada, V7R 2E8
PHONE: 1-604-736-9241 (09:00-17:00 PDT)

mcvoy@crys.WISC.EDU (Larry McVoy) (04/26/87)

In article <441@prairie.UUCP> dan@prairie.UUCP (Daniel M. Frank) writes:

(I should note that I'm not really qualified to talk about this, I'm mostly
software.  But then, so is Dan...)

#   Short of new technologies such as optical computers and brute-force
#methods such as ECL and freon cooling, the only way to speed up a uni-
#processor is to increase parallelism.  This can be done by pipelining,

Whoa there. The ONLY way?  I beg to differ.  Think about caches for a 
moment.  Most are small, direct mapped, and flushed at context switches.
How much performance would be gained by making them larger, separate I&D,
full associative, contain process id's, etc.  

And this bit about optics?  Optics?  What will that buy you?  Sure light
travels fast but converting from electrons to photons is a drag.

And don't pooh-pooh ECL either.  There seems to be a steady supply of
new technologies (read about super conductors?).

OK, now lets back off a bit.  I'm not disagreeing with the statement about
parallelism - to a certain extent I agree that a lot can be gained that
way.  But don't dismiss everything else with one grandiose sweep and expect
me to buy it.  It's not anywhere close to as simple as you make it sound.

#   Anyway, the more parallism you want, the more integrated your memory
#management hardware has to be with the CPU.  If you put it off-chip,
#you'll need more pins, and the propagation delays may slow down your
#overall cycle time.

Not necessarily.  Again, remember the value of a nice big smart cache.

#at critical times?  I claim that the 80x86 series does just that (whether

[Flame++ ]

The 80x86 series is a load of upwardly compatible garbage and the whole
world knows it.  I doubt there's a CS or ECE person anywhere that can 
honestly say they like this architecture.  If they do, they should go 
back to school.  Go look at the instruction sets before you flame me.
The word "orthogonal" does not exist in the Intel vocabulary.  The word 
"hack" does.

[Flame-- ]

#>	Question 2 :  if there is enough space on the chip, would
#>everybody put the MMU on-chip?
#
#   I suppose so.  If there was enough space (and they could cool it!), 
#they'd try to put EVERYTHING on the chip.

Also not quite true.  Large chips are a drag.  They're a drag to lay out,
a drag to manufacture, a drag to cool (as you noted), etc.  What you
really want to do is put everything that's in the "main loop" on chip,
leave everything else off chip.  For example: suppose you look at a Vax
and realize that the good old polynomial evaluator instruction isn't 
used very much.  Why not use that chip space for cache and do the poly 
func in software or in a slave processor?  Take the 801 philosophy
of having to justify every instruction/feature, whatever on the chip.

#>	Question 3 :  if there is only enough room for either a cache
#>or a MMU, which one will prevail?
#
#   My knee-jerk response is:  it is so hard to really integrate an external
#MMU with a pipelined processor, that you'll win by putting the MMU and a
#small cache on chip, and putting a larger cache off-chip.  I hear the two
#level cache worked out pretty well in the Microvax.

It did?  I guess you never had more than 1 active job on your uVax, huh?
Compare a uVax with 3 users to a 680xx with 3 users.  I know where I want
to work.

---------

I guess that I really want to say this:  Don't throw out flip answers to
hard problems.  I *know* that I have a lot to learn in this area and I
try to be especially careful because of it.  If you are going to advocate
parallelism, don't do so without noting the difficulty of writing parallel
code.  If you're going to dismiss advances in technology, don't do so
without proposing something better.  
-- 
Larry McVoy 	        mcvoy@rsch.wisc.edu  or  uwvax!mcvoy

"What a wonderful world it is that has girls in it!"

authorplaceholder@gorgo.UUCP.UUCP (04/27/87)

This is (of course) a very fuzzy question. I would tend to go (for now) with
an off-chip mmu for several reasons:

	1) On-board MMUs require micro-bus cycles just like separate MMUs,
	   and depending upon the uP architecture may take the same number
	   of cycles.

	2) We can't add virtual cache with an on-board MMU. The advantage of
	   virtual over physical cache is that it operates in parallel with
	   the MMU cycles and returns in nearly half the time on a hit,
	   whereas the physical cache always requires a complete MMU cycle.

	3) Some applications (small signal processing, etc.) don't really
	   require the MMU, so why should one drive up the cost of the uP
	   by adding one on-board?

	4) Some architectures support stackable multiple MMUs that operate
	   parallel. One obviously cannot do this if the MMU is on-board.

I am sure that there are numerous other reasons why off-chip MMUs are more
desirable.

	Steve Blasingame (Oklahoma City)
	ihnp4!gorgo!bsteve

mash@mips.UUCP (John Mashey) (04/27/87)

In article <441@prairie.UUCP> dan@prairie.UUCP (Daniel M. Frank) writes:
>In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
....
>>	What are the important deciding factors in designing a MMU
>>on-chip or off-chip?
....
>>	Question 2 :  if there is enough space on the chip, would
>>everybody put the MMU on-chip?
>
>   I suppose so.  If there was enough space (and they could cool it!), 
>they'd try to put EVERYTHING on the chip.
>
>>	Question 3 :  if there is only enough room for either a cache
>>or a MMU, which one will prevail?
>
>   My knee-jerk response is:  it is so hard to really integrate an external
>MMU with a pipelined processor, that you'll win by putting the MMU and a
>small cache on chip, and putting a larger cache off-chip.  I hear the two
>level cache worked out pretty well in the Microvax.

As usual, it really depends on what you think you're building.
Almost any cache of any reasonable size is better than having no cache
at all, if you're careful.  For cache verus MMU:
1) If you're building controller chips that can easily get along without
an MMU, then you might as well put a small I-cache on board, if nothing else.
Even small I-caches work.
	[68020]
2) If you're building a "system" chip, then you're awfully tempted to
put the MMU on-chip, assuming there's enough space to make it "big enough"
to get adequate hit rates for the applications you intend to run.
Note that the hit rates of small TLBs are much better than those of
small caches.
	2a) To minimize baord space, you might put the MMU on-chip, and
	build special cache-rams, or else build special MMU/cache chips
	[Clipper; 78000?] In this case, the tradeoff is to accept a cap
	on performance if you can't beef up the caches when you want to.
	2b) To go for all-out performance, at some cost in board space,
	put the MMU on-chip, use ordinary SRAMs, and also put the cache
	control on-chip [MIPS R2000].  Here, the tradeoff is that the
	minimum board space is slightly larger than 1) or 2A), but you
	have more high-end left by adding more SRAMs.

It is clear that on-chip caches, by themselves, simply CANNOT be made large
enough in current technologies to get into the higher performance regions.
[The following is a generalization.  All generalizations are false:]
ASSUMING YOU WANT TO RUN SUBSTANTIVE PROGRAMS, AND NOT CROAK RUNNING
MULTI-USER UNIX:  [all this for integer operations; FP has different
ratios]:

2-3 Mips [VAX 11/780, 4.3BSD == 1] seems to be about the limit for
systems with  1K or less cache.
	Examples: good 16.7MHz 68020 = 2 Mips [Sun3/160]
		: good 20MHz 68030 = 3 Mips [wild guess; I really don't 
			understand how the data cache is really going to behave]
		: cacheless 386 = 2 Mips
	Possible exception: newest IBM RT PC, which might be 3-4Mips

3-5 Mips: Examples:
	: good 25MHz 68020 [with 64K cache, 64-bit buses, in Sun3/260] = 4 Mips
	: good 386 with 64K cache = 4 Mips [?]
	: Clipper with 4K+4K caches = ? Mips [can't tell from the published
	numbers]  guess about 3-3.5 if kernel-intensive ones included
	: good 68030, 25MHz [whenever, not announced at this speed] guess
	= 5Mips by comparing with 4Mips, no-wait-state Sun3/260
	: R2000 with 24K cache = 5 Mips; with only 16K, was more like 4 Mips
I.e., some performance levels REQUIRE external caches.
A whimsical way to put it is that a fast CPU is just a way for
fast SRAMs to reach self-actualization, i.e., max performance. :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

clif@intelca.UUCP (Clif Purkiser) (04/27/87)

> In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
> >--------------
> >
> >	This is my first posting, so if there is any mistake, please
> >excuse.
> >
> >	I am doing a project on MMU's, and from reading various uP
> >data books, I have several questions:
> >
> >	The Intel iAPX 286 has an on-chip  MMU.
> >	The Motorola 68020 has an off-chip MMU (68851).
> >	What are the important deciding factors in designing a MMU
> >on-chip or off-chip?
> >
> >
> >Roy Fan  fan@cs.ucla.edu
(Rich's comments start here)
> Other factors to be considered in the choice of any MMU scheme:
> 
> One should look at the architecure of the MMU i.e., segmented
> only, demand paged only, or a combination thereof.
(deleted material) 
> 
> One should look at the potential for a particular MMU scheme being
> incorporated in future generation of processors. You should look
> at the 286 MMU and the 386 MMU. They are not compatible. The MMU
> in the 68030 is compatible with the 68851 PMMU chip.

Please explain how the 386 MMU is not upwardly compatible with the
80286 MMU.  If your statement is correct than it should have taken
more than the few man days that it took to get Xenix 286 and RMX 286 
(Both OSs used the 286 Protected Mode) working on the very first stepping
of the 80386.  The 386 MMU is compatible with the 286's MMU.

After reading some 68030 articles I was under the impression that
the 030 implemented a subset of the 68851, perhaps I am wrong.  
More importantly, the 68030 MMU is totally incompatible with the
MMU architect of some of your bigger customers Sun, Apollo and almost
all of the other 68K Unix manufacturers who couldn't wait for Mot
to get the of-chip MMU working correctly.  Or they didn't want to
pay the performance penalty, added wait-states, associated with an 
off-chip MMU.

>
( deleted Additional ramblings about 68xxx ) 
> -- Rich Goss
> Motorola Western Regional Field Applications Engineer for 68000
> Family

Responding to Mr Fan's original question.  I believe a very important
consideration is that an on-chip MMU allows binary compatibility 
between machines.  For instance on the Unix System for the 80386,
several ISVs have been shocked to discovered  that  same binary disk
of their application (like a database)  would work on 386 systems
with radically different hardware.   For example 386 Unix machines
use a wide variety of buses MultiBus I, MultiBus II, PC AT bus, and  
proprietary buses.   But because the 386 has an on-chip MMU and a
well defined file format (COFF) binary compatible has been achieved
between different 386-base Unix  computers.

Contrast this with Unix systems that use processors with off-chip MMUs.
If I want to buy an application for a 68020 machine I have to specify which 
machine I am using Apollo, Masscomp etc. Eventhough all of the machines are 
running Unix-like OS  and use the 68020.  The result is that Unix ISVs spend
most of their time (and make most of their money) porting applications
to different machines, instead of developing new applications or
improving the existing ones.

-- 
Clif Purkiser, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif

These views are my own property.  However anyone who wants them can have 
them for a nominal fee.

grenley@nsc.UUCP (04/28/87)

In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>	What are the important deciding factors in designing a MMU
>on-chip or off-chip?
>	Three I can think of:  execution speed, chip space, and
>additional support.
>	Execution Speed:  In general, on-chip MMU is faster than
>off-chip MMU.
>	Chip Space:  Sometimes, there is not enough space for putting
>a MMU on-chip.  Sometimes, a cache is implemented instead of a MMU.
>	Additional Support:  If the MMU is on-chip, then some
>additional instructions might be needed.  If the MMU is off-chip, then
>additional pins might be needed.

Don't forget cost.  If the MMU must be implemented as a separate chip
or chips, it is expensive.  On the other hand, the incremental cost
of silicon, even in 386/68030/32532 class processors, is small in 
comparison to overall system cost. For example, our yield on 532s
does not increase dramatically if the MMU is deleted.

>	Question 1 :  are there any other factors that might affect the
>design of the MMU being on-chip or off-chip?

Biggest factor is: Do you need an MMU?  If the processor is targeted at
general computing applications (read Unix) then MMU is req'd, and it should
be on chip.  If CPU is for control and other embedded applications, skip
the MMU.

>	Question 2 :  if there is enough space on the chip, would
>everybody put the MMU on-chip?

Well, we did.  So did Intel.  Anybody at Mot care to comment?

>	Question 3 :  if there is only enough room for either a cache
>or a MMU, which one will prevail?

I would guess cache.  CPU memory requirements are outrunning the ability
of DRAM to keep up.

Disclaimer: I work for NSC, designing systems based on the '532.  On the
other hand, I used to work for Intel, selling 286s.  The only computer
I spent my own money on is a Macintosh.  YOU figure out where my biases are.

Regards,

George Grenley

mhorne@tekfdi.TEK.COM (Mike Horne) (04/28/87)

In article <2581@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes:
>> (deleted stuff...)
>> -- Rich Goss
>> Motorola Western Regional Field Applications Engineer for 68000
>> Family
>
> (deleted stuff...)
>Clif Purkiser, Intel, Santa Clara, Ca.
>{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif

Rich, let's not stretch the truth.  Clif, let's be a man enough to not pick
a fight.  The man wanted info about MMUs, not slanderous dis-information.

Can we please get on with a decent discussion?  This junk has spanned 4
newsgroups now!

						-MTH
						 KA7AXD

-- 
---------------------------------------------------------------------------
Michael Horne - KA7AXD                  UUCP: tektronix!tekfdi!honda!mhorne
FDI group, Tektronix, Incorporated      ARPA: mhorne@honda.fdi.tek.com
Day: (503) 627-1666                     HAMNET: ka7axd@k7ifg

mark@applix.UUCP (Mark Fox) (04/28/87)

In article <2581@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes:
>... I believe a very important
>consideration is that an on-chip MMU allows binary compatibility 
>between machines.  For instance on the Unix System for the 80386,

	blah, blah, blah...

>Contrast this with Unix systems that use processors with off-chip MMUs.
>If I want to buy an application for a 68020 machine I have to specify which 
>machine I am using Apollo, Masscomp etc. Eventhough all of the machines are 
>running Unix-like OS  and use the 68020...
>
>Clif Purkiser, Intel, Santa Clara, Ca.

Whoa there Clif! Turn off the flames.

Since when does the on-chip MMU or lack thereof matter to Unix
ISVs (read that Unix application vendors)?? The reason we have vendor-specific
code is primarily because Apollo, Masscomp etc. have different software
environments (Aegis vs RTU). The fact that their MMUs are different affects
only the hardware vendors themselves and then only their folks who develop and
maintain the operating systems. The people I work with don't care if the MMU
is on the chip or not, only whether one system out-performs another and
whether or not the ugliness of the architecture gets in our way. :-)

Granted, it was nice to see our application running on an early pre-announced
Compaq 386 -- the same binary that ran on an IBM PC/AT -- but why is that
any more remarkable than seeing our application running on both a Sun 2
(68010) and a Sun 3 (68020) without being recompiled or relinked?

The reason for the compatibility between the 386 and 286 is that the operating
system (ie software environment) was identical, not because the MMU was on-chip.
When we port to a real System V for the 386, all bets are off regarding binary
portability between that and Xenix!
-- 
                                    Mark Fox
       Applix Inc., 112 Turnpike Road, Westboro, MA 01581, (617) 870-0300
                    uucp:  seismo!harvard!halleys!applix!mark

lamaster@pioneer.arpa (Hugh LaMaster) (04/28/87)

In article <4244@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes:
>In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>>	What are the important deciding factors in designing a MMU
>>on-chip or off-chip?


>
>>	Question 3 :  if there is only enough room for either a cache
>>or a MMU, which one will prevail?
>
>I would guess cache.  CPU memory requirements are outrunning the ability
>of DRAM to keep up.
>
>
>George Grenley


I would like to put in a plug for an on-chip MMU, an on chip (relatively
small) instruction cache (works with virtual addresses in the current
context only), and an off chip data cache (may not be necessary with
enough registers).  The reasons:

1)  It is nice when people build compatible architecture systems from your
    chip, as they are more likely to do with an on chip MMU, because it is
    easier to port software;

2)  The kinds of problems that I work with benefit a lot more from an
    instruction cache than a data cache (Cray and/or Control Data have been
    building very high performance machines for years with an instruction
    cache only, and no data cache);

3)  You can't put a large cache on a chip anyway;

4)  Data caches on chip will complicate multiple processor implementations;

5)  If there is more room on the chip after the MMU is on, the next step is to
    put the ARITHMETIC back on the chip (no extra FPA necessary then), and the
    next step after that is to divide the ALU into segmented functional units,
    then add vector instructions with fully segmented functional units, and to
    make sure you have enough registers, with everything in "RISC style" (no
    microcode, lots of random logic);

6)  Then, after all that is on the chip, and you still have room ( :-)  )
    put the data cache back on the chip.  



  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

fan@CS.UCLA.EDU (04/28/87)

In article <58500002@gorgo.UUCP> bsteve@gorgo.UUCP writes:
>	1) On-board MMUs require micro-bus cycles just like separate MMUs,
>	   and depending upon the uP architecture may take the same number
>	   of cycles.

	Don't the current uP use pipelining to increase the
throughput, for example 80386, 32532?  Thus there could be no
micro-bus cycles required.

>	2) We can't add virtual cache with an on-board MMU. The advantage of
>	   virtual over physical cache is that it operates in parallel with
>	   the MMU cycles and returns in nearly half the time on a hit,
>	   whereas the physical cache always requires a complete MMU cycle.

	We could put a small cache on the chip (if there is enough
room).  Of course, if there isn't enough room, we will need to go
off-chip.

>	3) Some applications (small signal processing, etc.) don't really
>	   require the MMU, so why should one drive up the cost of the uP
>	   by adding one on-board?

	Yes, you're right.

>	4) Some architectures support stackable multiple MMUs that operate
>	   parallel. One obviously cannot do this if the MMU is on-board.

	If the MMU is off-chip, usually it should be large enough to
ensure some high hit ratio.  Then by increasing the number of MMU
wouldn't increase the performance much.  However, it would increase
the board space.

	I guess for performance-wise, on-chip MMU is faster than
off-chip MMU.  But then there are just too many factors to be considered.

Roy Fan
University of California    fan@cs.ucla.edu
Los Angeles

paul@unisoft.UUCP (04/28/87)

In article <2581@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes:
>
>After reading some 68030 articles I was under the impression that
>the 030 implemented a subset of the 68851, perhaps I am wrong.  
 .....
>
>-- 
>Clif Purkiser, Intel, Santa Clara, Ca.
>{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif
>

	While it is true that the PMU is a superset of the 68030 this is in
practice not a problem, for Unix kernels anyway, it turns out that the stuff
in the PMMU that is not in the 68030 is not generally used by Unix kernels
(the accent here is on UNIX kernels, other [ring protection domain based kernels
for example] might use them, or sophisticated debuggers). One of the reasons
for this is that early on everyone had to have MMB compatability (there were no
PMMUs) and all those extra goodies weren't there. In our PMMU/MMB kernels when
we did the MMU code we had no 68030 docs to guide us (just rumors, mostly
untrue) and the result was something that will port to a 68030 with less than a
week's work. I think that the 68030 being a subset of the PMMU is a non issue,
for Unix anyway.

	"68030, what me worry?"

	Paul Campbell
	UniSoft Systems
	..!ucbvax!unisoft!paul

henry@utzoo.UUCP (Henry Spencer) (04/28/87)

> Contrast this with Unix systems that use processors with off-chip MMUs.
> If I want to buy an application for a 68020 machine I have to specify which 
> machine I am using Apollo, Masscomp etc. Eventhough all of the machines are 
> running Unix-like OS  and use the 68020...

C'mon now, Clif, be fair:  the MMU has virtually nothing to do with this.
The divergences between the various 68020 boxes are in things like object
file formats, system-call conventions, and graphics facilities.  You may
have done a better job on standardizing file formats and call conventions
(although in a couple of years, after the 386 is used more widely, you may
have cause to eat those words), but simply recompiling cures those ills.
The real "porting" problems are things like different graphics hardware
and System V vs. Berklix -- hardly the fault of the MMU.  Actually, in
a different sense they are:  the ugliness of the addressing model on your
previous processors is the reason why you don't have to worry about these
things (yet!), because all the serious divergence took place on better
machines!  Unless you're really clamping down hard on 386 developers, the
same thing will happen to you before too very long.  Welcome to the
32-bit world; hope you like it. :-)
-- 
"If you want PL/I, you know       Henry Spencer @ U of Toronto Zoology
where to find it." -- DMR         {allegra,ihnp4,decvax,pyramid}!utzoo!henry

mash@mips.UUCP (04/29/87)

In article <58500002@gorgo.UUCP> bsteve@gorgo.UUCP writes:
>This is (of course) a very fuzzy question. I would tend to go (for now) with
>an off-chip mmu for several reasons:
>	1) On-board MMUs require micro-bus cycles just like separate MMUs,
>	   and depending upon the uP architecture may take the same number
>	   of cycles.
Or may not; on-chip is often easier to overlap.

>	2) We can't add virtual cache with an on-board MMU. The advantage of
>	   virtual over physical cache is that it operates in parallel with
>	   the MMU cycles and returns in nearly half the time on a hit,
>	   whereas the physical cache always requires a complete MMU cycle.
This is 100% not true if the chip has the cache control on it.
Recall also that there can be substantial hit-rate losses from virtual
caches, as well as serious complexifications [nothing that can't be beaten,
but some things get more complex in both hardware and software.  There's
an interesting paper on the VM for the SUn3/260 in the next USENIX.]

>	3) Some applications (small signal processing, etc.) don't really
>	   require the MMU, so why should one drive up the cost of the uP
>	   by adding one on-board? ....
This can be true; but only if the chip is intended for controller applications
as a first priority, because it may cost $$ and cycle time to then add
an MMU.

>I am sure that there are numerous other reasons why off-chip MMUs are more
>desirable.

On the other hand, at least several vendors [such as Motorola] who used
to have the MMU off-chip are now putting it on.  Finally, our MIPS R2000s
use on-chip TLBs, and they're not exactly slow: if you know a
micro with a separate MMU that's faster, and that you can actually buy,
please post some info.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.UUCP (04/29/87)

In article <1401@ames.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes:
>In article <4244@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes:
>>In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>>>	What are the important deciding factors in designing a MMU
>>>on-chip or off-chip?

>I would like to put in a plug for an on-chip MMU, an on chip (relatively
>small) instruction cache (works with virtual addresses in the current
>context only), and an off chip data cache (may not be necessary with
>enough registers).  The reasons:
	(bunch of reasons... msot of which are quite reasonable)

>5)  If there is more room on the chip after the MMU is on, the next step is to
>    put the ARITHMETIC back on the chip (no extra FPA necessary then), and the
>    next step after that is to divide the ALU into segmented functional units,
>    then add vector instructions with fully segmented functional units, and to
>    make sure you have enough registers, with everything in "RISC style" (no
>    microcode, lots of random logic);

>6)  Then, after all that is on the chip, and you still have room ( :-)  )
>    put the data cache back on the chip.  

Given Hugh's general wishes expressed elsewhere [including good FP],
the only piece I'd disagree with here is 5), and we're probably not
really disagreeing.  For sometime, it will be very difficult to get
a high-performance FP unit on the same chip as (even a RISC) cpu.
Our FPU is bigger than our CPU, and if the chippers had dared make it
bigger, they would have used the space to eliminate another cycle or
two, rather than trying to cram the CPU and FPU together.  For some time,
you can either have low-medium FP performance on the CPU chip [a
legitimate design point], or you can have high-performance FP off-chip.
Note: I'm talking the differences between 2-cycle DP Adds versus the
30-50 cycles or more prevalent in coprocessors these days.

The only other disagreement [and it only by omission], is that
the argument started as cache versus MMU.  Hugh extended it logically
to include most of the other elements that might be there.  There's
one that's left out, and it's actually one of the highest priorities,
and it really doesn't cost too much: put all the cache control for
external caches onto the chip. 
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

jfh@killer.UUCP (John Haugh) (04/29/87)

Much has been argued in this group about whether or not Intel has a better
MMU and how slow Motorola was in getting their MMU out.

Custom MMU's are not particularly hard to build, and many companies have
done so.  Pinnacle Systems/Logic Process Co/Whoever they are next week uses
a custom MMU built out of 45ns or 35ns (depending on which machine) static
rams.  The Pinnacle XL/MPulse 10 (sounds like a merger of 2 bank cards here
in Dallas, MPact and Pulse :-) runs a 68K at 12MHz, 0 wait states, and
according to the latest rumors the Pinnacle XL020/MPulse 20 runs a 68020
at 16MHz, 1 wait state.  Supposedly John Bremsteller has been working on
getting the 020 up to 25MHz with only 1 or 2 wait states.

So, with a lack of 'on chip MMU' this little company has done about as
good as you can get.  I see no reason to argue about whether the MMU
belongs on chip or off, so long as you don't take a performance hit for
it.

And yes, when someone said an IBM-PC/?? would beat my 6MHz 68000, I too
laughed.  I have never seen a 286 running more than 4 or 5 users at 8MHz,
but I have seen 8 or 9 users on a 68000 at 6MHz, and it just gets better
with more memory, faster clock speeds, faster disks, and all of the other
new goodies that have come out in the 5 years I've had my machine.

- John.		(jfh@killer.UUCP)

Disclaimer:
	No disclaimer.  Whatcha gonna do, sue me?

henry@utzoo.UUCP (Henry Spencer) (05/07/87)

> Custom MMU's are not particularly hard to build, and many companies have
> done so.  Pinnacle Systems/Logic Process Co/Whoever they are next week uses
> a custom MMU built out of 45ns or 35ns (depending on which machine) static
> rams.  The Pinnacle XL/MPulse 10 (sounds like a merger of 2 bank cards here
> in Dallas, MPact and Pulse :-) runs a 68K at 12MHz, 0 wait states, and
> according to the latest rumors the Pinnacle XL020/MPulse 20 runs a 68020
> at 16MHz, 1 wait state.  Supposedly John Bremsteller has been working on
> getting the 020 up to 25MHz with only 1 or 2 wait states.

You will forgive us, I trust, for not being too impressed...  The Sun-2,
now obsolete, ran a 68K at 12MHz with no wait states.  The early Sun-3
models, starting to look dated, run a 68020 at 16MHz with 1.5 wait states
(how in @#$%@ do they get half a wait state?...).  The Sun-3 200 series
runs a 68020 at 25MHz with circa no wait states, out of fast virtual cache.
Predictably, it's done with fast static RAMs.  This is nothing new, the
first Suns date back to circa 1980.
-- 
"If you want PL/I, you know       Henry Spencer @ U of Toronto Zoology
where to find it." -- DMR         {allegra,ihnp4,decvax,pyramid}!utzoo!henry

mitch@stride1.UUCP (Thomas P. Mitchell) (05/09/87)

In article <122@motsj1.UUCP> rich@motsj1.UUCP (Rich Goss) writes:
>In article <5635@shemp.UCLA.EDU> fan@CS.UCLA.EDU (Roy Fan) writes:
>>--------------
>>	I am doing a project on MMU's, and from reading various uP
>>data books, I have several questions:
>>
>>	Question 1 :  are there any other factors that might affect the
>>design of the MMU being on-chip or off-chip?

The answer is system design, what functions do you want your
customers to put arround your processor and what will they do
with them.  Also what the programmers model looks like.

I was lucky enough to hear a short talk by a gentleman at MIPS.
He outlined some of their design goals.  Memory translation takes
time in fact a lot of time. Their reduced instruction set gave
them enough silicon to build the kind of processor that they felt
Unix needed.  When the 8080 was born UNIX was a rare beast.  So
clearly Intel did not have the Unix community in their system
design goals.  You might also look at a patent by Sun on their
MMU it gives a clue toward the problems with time when building
an MMU.  (We use 68010 and 68020s here).

>>
>>	Question 2 :  if there is enough space on the chip, would
>>everybody put the MMU on-chip?

If there was enough space everything would be on chip.  Gate
delay times are much shorter on chip than off chip. Low_Power + CPU
+ MMU + 100 GB RAM = ;-).

Thomas P. Mitchell (mitch@stride1.Stride.COM)
Phone:	(702) 322-6868 TWX:	910-395-6073
MicroSage Computer Systems Inc. a Division of Stride Micro.
Opinions expressed are probably mine.

adam@misoft.UUCP (05/15/87)

In article <319@crys.WISC.EDU> mcvoy@crys.WISC.EDU (Larry McVoy) writes:
>
>And this bit about optics?  Optics?  What will that buy you?  Sure light
>travels fast but converting from electrons to photons is a drag.
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So why bother? Have optical sensors on keyboards, direct light output on
your terminal screen. All comms can easily be optical fibre. No need to convert
optical disk information into electrical current. The only reason
for converting between electrons & photons is interfacing to the old electron-
driven computers.
       -Adam.

/* If at first it don't compile, kludge, kludge again.*/

lm@cottage.UUCP (05/20/87)

I sez:
>>And this bit about optics?  Optics?  What will that buy you?  Sure light
>>travels fast but converting from electrons to photons is a drag.
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
(Adam Quantrill) sez:
#So why bother? Have optical sensors on keyboards, direct light output on
#your terminal screen. All comms can easily be optical fibre. No need to convert
#optical disk information into electrical current. The only reason
#for converting between electrons & photons is interfacing to the old electron-
#driven computers.
#       -Adam.
#
#/* If at first it don't compile, kludge, kludge again.*/

Because, my good sir, optical gates don't work very well yet.  And it looks
like they may never work very well.  In spite of what Sci America or any
other "knowledgeable" rag says, optical gates are more than likely pie
in the sky.  And until they or something else comes along, electrons and
computers will stick together.

    Now, optical busses are another question.  It's awfully nice to have 
a 64 bit bus in this teensy tiny fibre running across your chip.  Or acting
as a backplane bus.  But my original statement stands: conversion is a pain
and you want to look very carefully at the tradeoffs before embracing
optics.

Larry McVoy 	        lm@cottage.wisc.edu  or  uwvax!mcvoy