gnu@sun.uucp (John Gilmore) (06/15/85)
> From Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca: > If Mot had gone with pipelined address/data on the 68020R16, > I'd guess that their memory access times (addr->data) would go from > 115ns to 170ns. However, they may use pipelining internally to access > their cache, so they can never allow this extra margin for system > designers (does anyone know if this is true?). I think the 68020 drives the address of prefetches (if there's not already a cycle on the bus) but will not assert address strobe if it hits the cache. AS doesn't come out until the addresses are stable anyway, so the cache lookup is overlapped with the address driver propagation delay (and setup time on whoever's receiving the addresses). Serious MMUs start to translate the address before AS anyway, so it actually helps to not have to latch the address, since as fast as the CPU can drive it, the MMU can start looking it up, rather than having it sit on the wrong side of a latch until a strobe comes out. In a 180ns memory cycle it's VERY hard (both for CPU and for memory subsystem) to run with Ken's proposed 170ns addr->data times. It's clear that the 68020 can access memory faster than dynamic ram can respond. There are plenty of solutions developed for mainframes (which have had the same problem for a long time); the on-chip instruction cache is one of them. Ken's overlapping technique may be one that the 68020 design precludes. Got any stats on how many 286 designs use the technique, and how much time is really saved (e.g. is addr->data really the bottleneck)?
kds@intelca.UUCP (Ken Shoemaker) (07/01/85)
> I think the 68020 drives the address of prefetches (if there's not > already a cycle on the bus) but will not assert address strobe if it > hits the cache. AS doesn't come out until the addresses are stable > anyway, so the cache lookup is overlapped with the address driver > propagation delay (and setup time on whoever's receiving the > addresses). Serious MMUs start to translate the address before AS > anyway, so it actually helps to not have to latch the address, since regardless of the timing of the address strobe, you really can't start looking up addresses for either your cache, or for your MMU until the addresses are guaranteed stable on the address bus, or have I missed some major advance in non-deterministic logic? > In a 180ns memory cycle it's VERY hard (both for CPU and for memory > subsystem) to run with Ken's proposed 170ns addr->data times. It's > clear that the 68020 can access memory faster than dynamic ram can > respond. There are plenty of solutions developed for mainframes (which I agree that it is not the easiest thing in the world to build a 170ns memory system. I would think that it is obvious that it is even more difficult to build a system that allowed only 115ns to do the same thing... > have had the same problem for a long time); the on-chip instruction > cache is one of them. Ken's overlapping technique may be one that the > 68020 design precludes. Got any stats on how many 286 designs use the > technique, and how much time is really saved (e.g. is addr->data really > the bottleneck)? I don't know how many 286 designs actually use the technique of running two memory cycles at the same time (although the 8207 DRAM controller supports doing this), but my main point was that by providing addresses earlier in a bus cycle (i.e., before the bus cycle even begins!) that you gain the address bus drive time (from the CPU) in the address to data time, since once in the bus cycle, you don't have to wait for the CPU to drive the capacitive loads on the address pins. Although addr->data may not be the ONLY bottleneck in a system, it is a very significant one, and by providing more of it while running the bus at the same speed you can't help but get a faster system, since the alternative is to add wait states. -- ...and I'm sure it wouldn't interest anybody outside of a small circle of friends... Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm {pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds ---the above views are personal. They may not represent those of the employer of its submitter.
ken@turtlevax.UUCP (Ken Turkowski) (07/02/85)
In article <9@intelca.UUCP> kds@intelca.UUCP (Ken Shoemaker) writes: >> I think the 68020 drives the address of prefetches (if there's not >> already a cycle on the bus) but will not assert address strobe if it >> hits the cache. AS doesn't come out until the addresses are stable >> anyway, so the cache lookup is overlapped with the address driver >> propagation delay (and setup time on whoever's receiving the >> addresses). Serious MMUs start to translate the address before AS >> anyway, so it actually helps to not have to latch the address, since > >regardless of the timing of the address strobe, you really can't start >looking up addresses for either your cache, or for your MMU until the >addresses are guaranteed stable on the address bus, or have I missed >some major advance in non-deterministic logic? What you have missed is a sense of timing. The address to the on-chip cache is stable 50 nS or so before the address on the bus. There are two sets of drivers between the two: one set on chip to driver the outside world, and the other to buffer the addresses to devices off the board. There's probably at least 25 nS for each of them. On top of that, mosts busses require some amount of setup time before the address strobe, something like another 50 nS. So here we have 100 nS between the time when addresses to the cache are stable to the time when the address strobe is to be asserted. -- Ken Turkowski @ CADLINC, Menlo Park, CA UUCP: {amd,decwrl,hplabs,nsc,seismo,spar}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA
phil@amd.UUCP (Phil Ngai) (07/10/85)
>addresses). Serious MMUs start to translate the address before AS >anyway, so it actually helps to not have to latch the address, since >as fast as the CPU can drive it, the MMU can start looking it up, rather But you can't use the address until it is valid, which is the definition of AS anyhow. You also seem to be confusing latch and register. A latch such as the 74373 allows data to flow through it while enabled and holds its output while disabled. A register such as the 74374 stores its input on a clock edge. On the 8086, for example, you would use a 74373 and the address would be available at the time it was valid from the uP plus the prop delay of the latch, which is comparable to the prop delay of the address buffers you would need in a real system anyway. ALE (address latch enable) is not the gating item it would be if a register were used. If a 74374 were used, the address would be available a prop delay after ALE went inactive, which in turn is some setup time after address is valid. -- This is only my opinion and an unofficial one at that. Phil Ngai (408) 749-5720 UUCP: {decwrl,ihnp4,allegra}!amdcad!phil ARPA: amdcad!phil@decwrl.ARPA