[comp.sys.m68k] move sr/move ccr: is bigger better?

conklin@msudoc.UUCP (01/29/87)

I regret the ignorance I show in this request, but it seems as if I'm
missing the tip of an iceberg and since it can make the difference in
the computer system I am looking at this week (to purchase) I figure I
better ask.
 
This is a junkie to junkie question. That is, somewhere out there there
is a person who is -into- the 68xxx family, knows half the machines out
using them, and knows translations. For this qualified
indidual, I ask, "What is the difference between the 68000,the 68010
and the 68020? And throw in 68030 for good looks."
 
I'm very familiar, at many levels, with the current slew of 68000
machines. (Mac's, Lisa's, ST's.) I'm also, on a work basis, very
familiar with the results of 68020s through the fleets
Suns all over the University. So where does the 68010 come in?
 
I was under the impression that the 68010 had some degree of an on-board
MMU. However, I also here that the 68xxx family MMU is external, and
thus represents a partial bottleneck in backtracking compared to the
Intel 80x86 (x for a bad word) family which, albeit rudely, DOES give
one control over 64k segements. 
 
If the 68010 has an on-board MMU, it would seem it would make it an
-infinitely- preferable chip to the 68000. What about the 68020, is it a
super of the 68010 as well such that it can 'do what it does and more?'
 
The ultimate reason for this query is because, after seeing the OS-9,
AmigaDOS (Also multitasking, since it slows _multiple_ programs to a
crawl,) Micro-C shell and other assorted 'multitasking' setups,
I have come to the conclusion that, well, simply put, 
Unix wont run on it, reasonably. Granted, if it looks
like Unix, acts like Unix, and does like Unix, it is Unix, but it's just
not Unix enough for me. My hangup is with an MMU. Recently, I was
offered a chance at a 68010 system (2 meg of RAM) w/Unix; is this a baby
Sun in a box? Or a Sinclair on steroids? How well can something like
this, under whatever version of Unix, handle bringing in 12 300k
programs and letting them all talk to each other? (Memory can obviously
be expanded if necessary.)
 
Terry Conklin, Club Net Coordinator	"Where BBSing is more than a
..!ihnp4!msudoc!conklin			 hobby; we're on a misson from
conklin@mich-state.edu			 god!"

Club LANS (517) 372-3131		Club II (313) 334-8877

holloway@drivax.UUCP (02/02/87)

In article <1090@msudoc.UUCP> conklin@msudoc.UUCP (Terry Conklin) writes:
>           "What is the difference between the 68000,the 68010
>and the 68020? And throw in 68030 for good looks."

The 68008 has an eight-bit data bus, but is otherwise identical to the 68000,
which has a 16-bit data bus. The 68000 cannot recover from address and bus
errors, which means that paging schemes which depend upon the ability to
recover from bus errors won't work.

The 68010 saves enough info on the stack to recover from bus errors, and even
tell you where and how they happen, and allows you to complete the instruction
with some of your own hardware, talk to an MMU, etc. (Address errors, too. It
is possible to write an interrupt handler to allow you to access words on odd
byte boundaries, for example). So the big 68010 difference is, it allows you
to add an external MMU for dynamic paging schemes. (It is also somewhat faster,
and allows pipeline optimization).

The 68020 does all this, and more. It has instruction cacheing, which means
that for short loops, the 68020 needs never go to the external memory for the
code. It also has a very slick interface to co-processors, like floating point
chips and and MMUs (which are still external). The 68020 is a lot faster than
any of the previous generations.

From what I hear, the 68030 will have expanded caches, plus, possibly, data
cacheing.

-- 
....!ucbvax!hplabs!amdahl!drivax!holloway
My balogna has a first name, it's Jimbob. My balogna has a second name, it's
Boltwangle. But it prefers to be called Jim.

mash@mips.UUCP (02/03/87)

In article <862@drivax.UUCP> holloway@drivax.UUCP (Bruce Holloway) writes:
>
>From what I hear, the 68030 will have expanded caches, plus, possibly, data
>cacheing.

The 68030 is described as having both I-cache and D-cache on chip,
filled with burst-mode accesses, i.e., whenever they come out,
they will have [I think] 2 256-byte caches, each with 16 lines of 16 bytes each.
It will be interesting to see whether people turn the data-caching on or
not: depending on the benchmark and memory design, a tiny data cache
can actually make a system run slower, unlike the more usual speedup from
(even a small) I-cache. Just out of curiosity, does anybody out there
have any simulations for a 68K with this cache design?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

campbell@sauron.UUCP (02/04/87)

In article <109@winchester.mips.UUCP> mash@winchester.UUCP (John Mashey) writes:
> [...]
>It will be interesting to see whether people turn the data-caching on or
>not: depending on the benchmark and memory design, a tiny data cache
>can actually make a system run slower, unlike the more usual speedup from
>(even a small) I-cache. Just out of curiosity, does anybody out there
>have any simulations for a 68K with this cache design?

I believe that this holds only if the miss penalty for the small D-cache
causes one or more wait states to be induced when referencing the missed
location(1).  Since the best case access time of the MC68030 to external
memory is two wait states (synchronous mode) with or without the D-cache
I don't believe that the D-cache can cause a penalty in performance.

If anyone can think of cases in which this might not be true (i.e., cases
in which a small D-cache can cause a performance penalty under the stated
conditions) I'd appreciated your posting examples.  Thanks.

(1) This assumes a write-through cache and a relatively inexpensive
    cache flush operation (or that task switches are infrequent).
-- 
						Mark Campbell
						{}!ncsu!ncrcae!sauron!campbell

mash@mips.UUCP (02/08/87)

In article <822@sauron.Columbia.NCR.COM> campbell@sauron.UUCP (Mark Campbell) writes:
>In article <109@winchester.mips.UUCP> mash@winchester.UUCP (John Mashey) writes:
>>It will be interesting to see whether people turn the data-caching on or
>>not: depending on the benchmark and memory design, a tiny data cache
>>can actually make a system run slower, unlike the more usual speedup from
>>(even a small) I-cache. Just out of curiosity, does anybody out there
>>have any simulations for a 68K with this cache design?
>
>I believe that this holds only if the miss penalty for the small D-cache
>causes one or more wait states to be induced when referencing the missed
>location(1).  Since the best case access time of the MC68030 to external
>memory is two wait states (synchronous mode) with or without the D-cache
>I don't believe that the D-cache can cause a penalty in performance.
>
>If anyone can think of cases in which this might not be true (i.e., cases
>in which a small D-cache can cause a performance penalty under the stated
>conditions) I'd appreciated your posting examples.  Thanks.

Here is a very simple analysis, partially derived from what people said about
the Moto presentation at ICCD in October, i.e., that the D-cache hit rate
was around 50% [if this is wrongly quoted, please tell me; I was not there].

let X = number of cycles to fetch 1 word from memory outside the chip
let Y = number of cycles to fetch 4 words [the way the 68030 D-cache works]
let Z = number of cycles to fetch 1 word from on-chip D-cache
let M = Miss rate in D-cache [0..1.0]

then (grossly):
	cost to fetch data without cache : X
	cost to fetch with cache on: (1-M) * Z + M * Y
		(i.e., part of the time you hit, and each time it costs Z,
		and part of the time you miss, in which case it costs Y.)
One can assume that Z < X < Y.

Let's assume that Z = 0 (best case).  thus, the 2 cases reduce to:
	cost without cache: X
	cost with cache: M*Y
Thus, if X < M*Y, it is better not to use the cache.
For example, if X is 2, and Y is 4, and M is .5, then it's equal.
However, if Y is even 5, or if Z is not zero, then you do better without
the cache.

All of this is NOT intended to indicate real numbers, but to show that
you have to compute the miss-rate, and that high miss-rates may cost you.

OK, now perhaps some real examples.  The 68030 D-cache contains 16 lines
of 16-bytes each, direct-mapped, i.e., it wraps around each 256 bytes,
so that the line starting at 0, and the line starting at 256 cannot both
be present at once.  IF you have programs whose access is primarily sequential,
then all is well.  If not, then you may continually be fetching data
that doesn't get a chance to be used before it is kicked out of the cache,
but which cost you cycles to get.  Examples:
	a) Vector-processing code where the vectors line up in memory
	clashing with each other.
	b) Kernel code, which often walks all over memory looking at just
	a few bits in each structure.
Note: the miss rates in I-caches are almost always much better than for
D-caches, hence even a small I-cache usually wins, mainly due to linearity
of access.  Even a small D-cache will probably help function-return time,
but it may not help the rest of the code much.

In general, intuition on any of this is highly suspect [that's why I asked in
the first place if people had simulated this particular cache on 68K address
traces].

Bottom line: you must be careful with high-miss-rate caches.  If there is
any penalty for filling the cache (over just fetching the data), then
a cache can actually reduce performance.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086