wescott@ncrcae.UUCP (12/24/83)
Yet another 68k problem. We have found the following bug/feature (it was quite an annoyance finding it too!). The movem instruction in the 68k, when reading from memory into registers does an additional read cycle at the end of the transfer addressing one word beyond that required. No harm done, in most cases when the use is to save/restore registers on the stack. But the movem op has also been used to implement some fast bulk move routines (like blt, which came with the MIT compiler, I believe.) If the routine happens to use its movem op right up to the edge of a segment the extra read can cause a bus error where one is not expecting it. Michael Wescott, NCR Corp., Columbia SC ..duke!mcnc!ncsu!ncrcae!wescott
rpw3@fortune.UUCP (12/28/83)
#R:ncrcae:-103300:fortune:6600008:000:4718 fortune!rpw3 Dec 28 06:22:00 1983 This response is for the groady low-level hardware types. All of you purists go look the other way. (Somewhat chatty/flamey) ------------------------------------------------------------------------ One of the nice things about a 68000 when you first look at it is its clean hardware structure -- I sort of think of it as a 16-bit Z80 or PDP-8 (by which I mean a general workhorse). In fact, it's so neat and cheap that one considers using it for a sequencer/controller in all sorts of little hardware applications where one might consider some counters and a state machine or some 2901's and a 2911. I mean, after all, where else do you find a 32-bit adder, 16 32-bit registers, a sequencer, an instruction decoder, and that many bits of nano-code all in 64 pins? (Even if it does just have a 16-bit data path.) Just buying that many bits of ROM will cost you more! The MIT/NU/TRIX guys even built a debugging monitor that needed NO RAM; all of the state was in the registers. Etc. etc. Also, it fetches/executes at memory speed (for most instructions), so real-time code can be estimated by counting memory accesses, just like a PDP-8. (Watch out for MUL/DIV and long shifts.) Use it as a DMA chip! What other DMA chip can read it's channel program out of main memory, do looping and branching and protocol parsing? (Ans: Intel 8089, but it's badly flawed by an overly weak set of arithmetic and logic. Examples: the only way to shift left is to add something to itself. You CAN'T shift right. However, the 8089 IS a good model for how to use a 68000.) And as a DMA engine, it can't be beat! (assuming the data rates are adequate) Ah, there's the rub. Many of the hardware designers' little tricks for maximizing the data rates of controllers depend critically on the exact sequence of accesses by said controller. (E.g., see Don Lancaster's "Cheap Video Cookbook"). Such things as fetching forced NOPs from a data area (to use the PC as a counter), shadow registers (hardware under RAM), partial decodes (so a block move can fetch from a FIFO), all depend on knowing the exact sequence and side effects of each bus cycle of each instruction. With a Z80 or a PDP-8 it was easy. With a PDP-11 it was harder. With the 8086/8088 it was well nigh impossible, because of the huge pre-fetch queue. With the 68000, it's harder than one would like, but not impossible. For all good reasons, Motorola is reluctant to exactly spell out the order and number of bus cycles. If they did so, and we depended on that, they couldn't continually change the microcode to make it "better". (To be fair, they fixed a goodly number of bugs that way, faster than re-designing the chip would have been.) Here are some tricks that may help: 1. Don't touch hardware with a CLR. (It reads before it writes) Use "movl #0,xxx" 2. If you must use a MOVEM to fetch from a partially-decoded hardware address, fetch an extra word (or long), so that the double-fetch of the last one happens beyond the edge of the sensitive area. (That means, of course, you must leave "guard bands" of addresses. That DON'T bus-timeout). Ignore the extra garbage in the n+1 register. 3. Don't ever ever ever depend on what's pushed on the stack during an interrupt. The order changes with the stack position. 4. Don't count on seeing an interrupt vector fetched before the old PC is saved. In fact, don't use interrupts at all. (Complicates trying to predict cycles) 5. If you must use shadow/forced "fetch NOP" techniques, watch out for the pre-fetch. It can vary. 6. Use F<0:2> to distinguish accesses (but see #7). 7. Remember that the high address lines are all 1's during an interrupt acknowledge. You may be able to ignore the F bits. 8. Don't use MUL or DIV. They can drastically increase interrupt latency. Don't use MOVEM except to move highest priority data. (Multiply by small constants with shift-and-add/subtract) 9. Use a 68010 and "loop mode" instead of a 68000 and "movem". You get close to 100% of bus bandwidth memory-to-memory compared to ~85%. 10. Avoid using variable speed devices (that delay DTACK). Sequences of accesses change when DTACK is delayed, due to micro-code overlap changing. If you don't try to be as clever as we all were in the old days, things can still work o.k. Remember, a 10 Mhz 68000 can fetch NOP's at 40 megabits/s, and a 10 Mhz 68010 can block transfer memory-to-memory at nearly 20 Mbit/s. That's a lot of bandwidth for your controller to use. Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065