phil@amdcad.UUCP (Phil Ngai) (02/01/87)
In article <1701@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >The 68012 is a version of the 68010 which has a 31-bit address bus, >packaged in a chip carrier (or pin grid array?). It's actually the >same chip, but the package has more wires coming out. The 68010 >couldn't add more wires because it was designed to be plugged in >anywhere a 68000 goes. This was a stopgap for people who wanted a >large virtual address space (>16 megs) but couldn't wait for the >68020. I don't know any popular machine that uses them. I was going to say, you must mean physical address, not virtual, but then I remembered the 68K puts out virtual addresses since it has no MMU. >The 68030's MMU is the mediocre one that they build for the 68020; it >does slow Vax-style page table lookup in main memory and has more features >and options than you ever cared to read about. Any prediction on how fast a TLB miss is handled? I seem to recall the VAX 780, which does it in hardware, takes about 4 microseconds while MIPS, which does it in software, takes from 1-2 microseconds for a micro-TLB miss. I don't know how long a regular TLB miss takes. Many people are shocked at the idea but looking at the bottom line, "how long does it take", software TLB refill doesn't seem like such a bad idea. Any one know how fast the other chips are at TLB refills? Intel, NSC, Fairchild, which I assume do it in hardware? -- They also surf who only stand on waves. Phil Ngai +1 408 982 7840 UUCP: {ucbvax,decwrl,hplabs,allegra}!amdcad!phil ARPA: amdcad!phil@decwrl.dec.com
bjorn@alberta.UUCP (02/02/87)
In article <14561@amdcad.UUCP>, phil@amdcad.UUCP (Phil Ngai) writes: > I was going to say, you must mean physical address, not virtual, but > then I remembered the 68K puts out virtual addresses since it has > no MMU. The processor has nothing to do with the distinction between virtual and physical addresses in this case. That distinction is enforced by the MMU and the operating system. Bjorn R. Bjornsson alberta!bjorn
mash@mips.UUCP (02/02/87)
In article <14561@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes: (regarding 68030) >Any prediction on how fast a TLB miss is handled? I seem to recall the >VAX 780, which does it in hardware, takes about 4 microseconds while >MIPS, which does it in software, takes from 1-2 microseconds for a >micro-TLB miss. I don't know how long a regular TLB miss takes. Many >people are shocked at the idea but looking at the bottom line, "how >long does it take", software TLB refill doesn't seem like such a bad >idea. >Any one know how fast the other chips are at TLB refills? Intel, NSC, >Fairchild, which I assume do it in hardware? 1) A MIPS micro-TLB refill is actually 1 cycle: it's a refill of the tiny on-chip TLB from the 64-entry larger on-chip one. 2) A normal TLB refill is 9-10 instructions (convenient form is slightly different between 4.3 and V.3), + 0-5 cycles for a data-cache miss, + 2-4 cycles of pipeline breakage/time to get into refill routine. This totals 11-19 cycles, assuming NO I-cache misses in the refill routine. The latter cost [on the 5MIPS board/memory design] 5 cycles, so the worst case is about 60 cycles [7.5microsecs]. On the average, the actual cost is 1-2 cycles, yielding 13-21 cycles. Anyway, the bottom line is a little under 2 microseconds total penalty. In any case, for user level programs, this all costs about 1-2% of user execution time, even on fairly large programs, i.e., it's almost down in the noise with regard to performance. I.e., as long as it's fast enough, you can concentrate on making it have the behavior desired by the O.S., and then go worry about other things, like cache design. For example, cache miss overhead is a much larger performance issue: cache misses can easily eat up 10-50% of the cycles, depending on the design and the program. 3) "Many people are shocked at the idea" : I hope this is passing: after all, the same technique is used on HP Spectrums [for sure], and on Celerity boxes [I think]. It does depend on having fast exception handling: if that is not possible, it is probably better to use microcode. 4) Note that Data-cache miss penalities for fetching page-table entries account for 25-30% of the penalty above. This is relevant: in high-performance systems, even if the microcode is instantaneous, you still have 1-2 memory references, which are there whether you do it in hardware or software. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
phil@amdcad.UUCP (02/03/87)
In article <108@winchester.mips.UUCP> mash@winchester.UUCP (John Mashey) writes: > >1) A MIPS micro-TLB refill is actually 1 cycle: it's a refill of the tiny >on-chip TLB from the 64-entry larger on-chip one. When I saw the following text: "TLB REFILL IN SOFTWARE. Normal UTLBMISS refills are reasonably fast (10-14 125 nS cycles, or 1.2-1.7 microseconds in an 8MHz, 5 mips system).", I assumed UTLBMISS was micro TLB but now that I go back I see that's for a miss in kuseg. Sorry. -- They also surf who only stand on waves. Phil Ngai +1 408 982 7840 UUCP: {ucbvax,decwrl,hplabs,allegra}!amdcad!phil ARPA: amdcad!phil@decwrl.dec.com
henry@utzoo.UUCP (Henry Spencer) (02/04/87)
> 3) "Many people are shocked at the idea" : I hope this is passing: after > all, the same technique is used on HP Spectrums [for sure], and > on Celerity boxes [I think]... Yes, the Celerity boxes use it. Or at least, they did back when I read the manual for one, when we were thinking of buying it. There is some variation in how one handles the problem of making sure that the TLB-miss handler does not get TLB misses. I dimly recall that the MIPS hardware makes it easy to reserve some TLB slots for the system. On the Celerity machines the kernel just has to be careful, as I recall. If you want a really shocking idea, consider the latest notion from Cheriton and his bunch at Stanford. The CPU runs out of a cache, which is addressed by virtual address (virtual address includes a process id to avoid flushing on every context switch) and has very long cache lines (e.g. 128 bytes -- *bytes*, not *bits*). There is also a small amount of local memory which is accessible only in kernel mode. Cache misses are handled entirely in software, with a bit of hardware help for fast data moving. Notice that there is no MMU! The virtual->real translation is needed only for handling cache misses and can be done entirely by the software. The only real issue is whether software cache handling will be efficient enough. Given a large cache with a big line size, it just might work. There are some other neat tweaks to do cache-coherence handling for a multiprocessor system. Very new and very experimental, but a nice idea. -- Legalize Henry Spencer @ U of Toronto Zoology freedom! {allegra,ihnp4,decvax,pyramid}!utzoo!henry