[net.micro.68k] yet another 68k complaint

wescott@ncrcae.UUCP (12/24/83)

Yet another 68k problem.  We have found the following
bug/feature (it was quite an annoyance finding it too!).
The movem instruction in the 68k, when reading from memory
into registers does an additional read cycle at the end
of the transfer addressing one word beyond that required.

No harm done, in most cases when the use is to save/restore
registers on the stack. But the movem op has also been used
to implement some fast bulk move routines (like blt, which
came with the MIT compiler, I believe.) If the routine
happens to use its movem op right up to the edge of a
segment the extra read can cause a bus error where
one is not expecting it.


Michael Wescott, NCR Corp., Columbia SC
..duke!mcnc!ncsu!ncrcae!wescott

rpw3@fortune.UUCP (12/28/83)

#R:ncrcae:-103300:fortune:6600008:000:4718
fortune!rpw3    Dec 28 06:22:00 1983

This response is for the groady low-level hardware types. All of you
purists go look the other way. (Somewhat chatty/flamey)
------------------------------------------------------------------------

One of the nice things about a 68000 when you first look at it is its
clean hardware structure -- I sort of think of it as a 16-bit Z80 or PDP-8
(by which I mean a general workhorse). In fact, it's so neat and cheap
that one considers using it for a sequencer/controller in all sorts of
little hardware applications where one might consider some counters
and a state machine or some 2901's and a 2911. I mean, after all, where
else do you find a 32-bit adder, 16 32-bit registers, a sequencer,
an instruction decoder, and that many bits of nano-code all in 64 pins?
(Even if it does just have a 16-bit data path.) Just buying that many
bits of ROM will cost you more!

The MIT/NU/TRIX guys even built a debugging monitor that needed NO RAM;
all of the state was in the registers. Etc. etc.

Also, it fetches/executes at memory speed (for most instructions), so
real-time code can be estimated by counting memory accesses, just like
a PDP-8. (Watch out for MUL/DIV and long shifts.)

Use it as a DMA chip! What other DMA chip can read it's channel program
out of main memory, do looping and branching and protocol parsing?
(Ans: Intel 8089, but it's badly flawed by an overly weak set of
arithmetic and logic. Examples: the only way to shift left is to add
something to itself. You CAN'T shift right.  However, the 8089 IS a
good model for how to use a 68000.)

And as a DMA engine, it can't be beat! (assuming the data rates are adequate)

Ah, there's the rub. Many of the hardware designers' little tricks for
maximizing the data rates of controllers depend critically on the exact
sequence of accesses by said controller. (E.g., see Don Lancaster's
"Cheap Video Cookbook"). Such things as fetching forced NOPs from a
data area (to use the PC as a counter), shadow registers (hardware
under RAM), partial decodes (so a block move can fetch from a FIFO),
all depend on knowing the exact sequence and side effects of each
bus cycle of each instruction.

With a Z80 or a PDP-8 it was easy. With a PDP-11 it was harder. With
the 8086/8088 it was well nigh impossible, because of the huge pre-fetch
queue. With the 68000, it's harder than one would like, but not impossible.
For all good reasons, Motorola is reluctant to exactly spell out the order
and number of bus cycles. If they did so, and we depended on that, they
couldn't continually change the microcode to make it "better". (To be
fair, they fixed a goodly number of bugs that way, faster than re-designing
the chip would have been.)

Here are some tricks that may help:

	1. Don't touch hardware with a CLR. (It reads before it writes)
	   Use "movl #0,xxx"
	
	2. If you must use a MOVEM to fetch from a partially-decoded
	   hardware address, fetch an extra word (or long), so that
	   the double-fetch of the last one happens beyond the edge
	   of the sensitive area. (That means, of course, you must leave
	   "guard bands" of addresses. That DON'T bus-timeout). Ignore
	   the extra garbage in the n+1 register.
	
	3. Don't ever ever ever depend on what's pushed on the stack
	   during an interrupt. The order changes with the stack position.
	
	4. Don't count on seeing an interrupt vector fetched before the
	   old PC is saved. In fact, don't use interrupts at all.
	   (Complicates trying to predict cycles)

	5. If you must use shadow/forced "fetch NOP" techniques, watch
	   out for the pre-fetch. It can vary.

	6. Use F<0:2> to distinguish accesses (but see #7).

	7. Remember that the high address lines are all 1's during
	   an interrupt acknowledge. You may be able to ignore the F bits.
	
	8. Don't use MUL or DIV. They can drastically increase interrupt
	   latency. Don't use MOVEM except to move highest priority data.
	   (Multiply by small constants with shift-and-add/subtract)
	
	9. Use a 68010 and "loop mode" instead of a 68000 and "movem".
	   You get close to 100% of bus bandwidth memory-to-memory
	   compared to ~85%.
	
	10. Avoid using variable speed devices (that delay DTACK).
	   Sequences of accesses change when DTACK is delayed, due
	   to micro-code overlap changing.

If you don't try to be as clever as we all were in the old days, things
can still work o.k. Remember, a 10 Mhz 68000 can fetch NOP's at 40 megabits/s,
and a 10 Mhz 68010 can block transfer memory-to-memory at nearly 20 Mbit/s.
That's a lot of bandwidth for your controller to use.

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065