meulenbr@cstw01.UUCP (Frans Meulenbroeks) (01/02/89)
Happy New Year. I've been upgrading my 68000 system by replacing the 68000 with a 68020 using 16 bit bus cycles (and the addition of some glue to generate things like E and VMA). The system runs on the same clock frequency. The upgrade works, but leaves the following question: The expected performance gain due to the cache is not realised. I expected the 020 to run about the same speed as the 68000 with the cache disabled and about 20-30% faster due to the cache. However this does not hold. The system (an Atari ST) has the video memory access interleaved with the cpu. This means that every 4 clock cycles the video makes a memory access. Normally this runs quite smooth. The upgraded system shows some performance degradation, due to different timing. Measurements with an analyser have lead to the following theory: Somewhere in the 020 user manual it is stated that instruction prefetch is always a longword operation. This results in two bus cycles when using a 16 bit bus. In the simplest case two instructions are fetched. One is executed during the prefetch. The other one cannot complete in time, thus needing an extra clock cycle to complete. This extra cycle however, causes that the next instruction prefetch is delayed since otherwise it would interfere with the video thus giving additional delay. My questions: - is the above scenario a plausible explanation (the instruction used in the test was a NOP). - will the 020 start instruction execution while the second bus cycle is still in progress? - if I want to speed up the 020 by going to 16 Mhz (synchronously with the system clock) at which signals should I pay attention. I know that I should look at DSACK/DTACK since they may be asserted before data are valid on a read cycle. Also AS loooks like something to look at. Anyone knows about other signals to keep an eye on? Any suggestions about the problems that may occur on write cycles?? Many thanks! -- Frans Meulenbroeks (meulenbr@cst.prl.philips.nl) Centre for Software Technology ( or try: ...!mcvax!philmds!prle!cst!meulenbr)
bjh@motsj1.UUCP (Brad Holtzinger) (01/03/89)
meulenbr@cst.UUCP writes: > >My questions: >- is the above scenario a plausible explanation (the instruction used in > the test was a NOP). Your choice of instructions is a poor one. The nop instruction in the 68020 does more than just nothing, it is also used as a pipeline synchronizer. All possibilities of instruction overlap disappear when the nop is executed. Typically the bus control state machine would be decoupled (allowed to operate in parallel) from the execution unit, allowing the bus control state machine to complete the write bus cycle of one instruction while the next instruction's execution may be started. The next instruction would not be started if its operand was located in a memory location. (ie. All reads are sync points in the 68020 microcode.) >- will the 020 start instruction execution while the second bus cycle is > still in progress? It should but your choice of instruction may negate any measureable effect. I hope that this is helpful. -- Brad Holtzinger Western Region Systems Engineering Manager Motorola Microcomputer Division 1150 Kifer Road, Sunnyvale, CA 94086, UUCP: {hplabs, mot, oakhill} !motsj1!bjh Telephone: +1 408-991-7340
bruce@blue.gwd.tek.com (Bruce Robertson) (01/04/89)
I've also worked with this sort of arrangement in the past, and I found that you only just break even with the cache enabled, and lose dramatically with the cache off. One thing that will cause you trouble is instructions not aligned on long word boundaries. For example, the following instruction sequence takes only two fetches with the 68000, but a total of 4 on the 68020 with 16 bit memory: 0x02: moveq #1,d0 0x04: bra.b <somewhere> The words at location 0x00 and 0x06 are fetched, even though they are never used. If the branch target is misaligned, you have yet another wasted fetch. Also, I can't remember this for sure, but I think the 68020 may prefetch the instruction past the branch, on the assumption that you aren't going to branch, and that's two more 16-bit accesses wasted if you *do* branch. - will the 020 start instruction execution while the second bus cycle is still in progress? I can't say for sure, but it seems extremely unlikely since it's the bus interface that is doing the dynamic bus sizing, transparently to the rest of the processor. -- -- Bruce Robertson bruce@blue.gwd.tek.com
daveh@cbmvax.UUCP (Dave Haynie) (01/04/89)
in article <322@cstw01.UUCP>, meulenbr@cstw01.UUCP (Frans Meulenbroeks) says: > I've been upgrading my 68000 system by replacing the 68000 with a 68020 > using 16 bit bus cycles (and the addition of some glue to generate > things like E and VMA). > The system runs on the same clock frequency. > The upgrade works, but leaves the following question: > The expected performance gain due to the cache is not realised. > I expected the 020 to run about the same speed as the 68000 with the > cache disabled and about 20-30% faster due to the cache. No, the '020 will actually run slower in most cases on a 16 bit bus, with cache disabled, than the 68000 will. As you guessed below, the cache pre-fetch is to blame -- whether the cache is enabled or not, the '020 will always fetch a full longword for instruction fetches. Half of the pre-fetched longword may not be used, especially if the cache is turned off. With the cache on, you get the advantage of caching the whole longword, plus normal other cache benefits that generally make the '020 come out faster, though depending on application, maybe not all that faster. The '030 extends this "feature" to data fetches as well, so you will find that the '030 with caches disabled on a 16 bit bus runs a little slower than the '020. > - is the above scenario a plausible explanation (the instruction used in > the test was a NOP). Yup. > - will the 020 start instruction execution while the second bus cycle is > still in progress? I don't think so; though I've never actually proven it, based on the way I think the cache fetch mechanism works, this would be impossible. > - if I want to speed up the 020 by going to 16 Mhz (synchronously with > the system clock) at which signals should I pay attention. > I know that I should look at DSACK/DTACK since they may be asserted > before data are valid on a read cycle. Also AS loooks like something > to look at. Anyone knows about other signals to keep an eye on? > Any suggestions about the problems that may occur on write cycles?? You may be able to simplify the design based on Atari ST particulars, but in general, you have to emulate important edges of the 68000. These include: - Clock external /AS during S2, and /UDS-/LDS during S2 (READ) or S4 (WRITE). - Only sample /DTACK on the falling edge of S4. - Sample data on the falling edge of S6. Depending on what the ST and any peripherals may do, you may have to latch it. Also, you'll have to coordinate your assertion of /DSACK1 to this, making sure your '020 will have data in time. - Always end your external cycles on S7. I end my internal cycles slightly after S7. - You may have to clock /BG on the 68000's 8MHz clock, depending on the assumptions of the system (eg, how dependent is the external system on the 68000 specs). - You may have to sample the /IPL0-2 lines on an 8MHz clock, again depending on the external system. That should be enough to get it doing something interesting. > Many thanks! > -- > Frans Meulenbroeks (meulenbr@cst.prl.philips.nl) > Centre for Software Technology > ( or try: ...!mcvax!philmds!prle!cst!meulenbr) -- Dave Haynie "The 32 Bit Guy" Commodore-Amiga "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: D-DAVE H BIX: hazy Amiga -- It's not just a job, it's an obsession