lance@Ricerca.orc.olivetti.com (Lance Berc) (03/19/89)
As has been said, the i860 has two on-chip virtual caches (4k I + 8k D) and a 64 entry TLB, all of which need to be invalidated when context switching (and sometimes when the memory map is changed). The D-cache has to be flushed as well as invalidated (code is treated as immutable - self modifying code won't work unless it lives in non-cached pages). Intel estimates that at 33MHz flushing the D-cache takes on average 30usec (30% - 50% dirty) and 60usec worst case. I believe that these numbers assume no wait-state memory (fastest possible 5 2 2 2 CPU to memory write cycles). I'd be interested in seeing some numbers on the frequency of both context switching and interrupt handling in `typical' state-of-the-art machines under some sort of well-defined load (such as compiles using local disks under Unix on Sun-3s,4s, MIPS boxes, etc). This might help determine just how important raw context switch times are. Fast context switches are important, but it seems that standard Unix time quanta are not shrinking as the amount of work done per quantum increases. So maybe the percentage of CPU time spent context switching versus the amount of time spent doing `useful' work is becoming small enough that the raw context switch time is becoming less significant. The i860 seems to favor using silicon to gain sheer straight-line speed at the expense of some performance in the curves. Sounds like a good trade off to me, but it depends on where you drive... lance lance@orc.olivetti.com (415) 496-6200 Lance Berc lance@orc.olivetti.com Beer as an alternate Olivetti Research Center lance%orc.uucp@unix.sri.com currency! Menlo Park, California (415) 496-6248 < These opinions bear no resemblance to those of Ing. C. Olivetti & C. SpA. >
rpw3@amdcad.AMD.COM (Rob Warnock) (03/23/89)
In article <39485@oliveb.olivetti.com> (Lance Berc) writes: +--------------- | Intel estimates that at 33MHz flushing the D-cache takes on average | 30usec (30% - 50% dirty) and 60usec worst case. I believe that these | numbers assume no wait-state memory (fastest possible 5 2 2 2 CPU to +--------------- Similarly, because of the very large register file, the Am29000 appears at first to have a problem with full context switching (*not* system calls or interrupts, those continue the stack-cache discipline), since you have to save/restore 160 of the 192 registers (if 32 are reserved to the kernel). But at 25 MHz, using the load/store-multiple instructions and burst-mode memories (using normal static-column DRAMs bank-interleaved 2:1, still cheap), you can save the old user's full register set and load up the new user's full set in 12.8 microseconds. [The TLB has PIDs, so no flush there.] I just don't see 20-50 VUPS type of systems needing to do tens of thousands of full context switches per second, at least not in general-purpose timesharing... Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
mash@mips.COM (John Mashey) (03/24/89)
In article <24958@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes: >In article <39485@oliveb.olivetti.com> (Lance Berc) writes: >+--------------- >| Intel estimates that at 33MHz flushing the D-cache takes on average >| 30usec (30% - 50% dirty) and 60usec worst case. I believe that these >| numbers assume no wait-state memory (fastest possible 5 2 2 2 CPU to >+--------------- > >Similarly, because of the very large register file, the Am29000 appears >at first to have a problem with full context switching (*not* system calls... >I just don't see 20-50 VUPS type of systems needing to do tens of thousands >of full context switches per second, at least not in general-purpose >timesharing... A lightning look at busy machines around here showed 60-120 cs/sec. If it only takes 30-60 microsec, that's 1.8-7.2 millisec / sec, or a little less than half a percent. Now, that's the easy part. ----- The hard parts are: 1) figuring out how often you must flush the caches because you change a mapping in the kernel [maybe the Sun folks can say something on this; I recall them talking about tuning to avoid unnecessary flushings in virtual caches], and 2) figuring out what the aggregate cache miss rate impact is of flushing the caches more often. (I have no idea, and it surely is load-dependent, and there's probably some nice papers sitting around to be done.) The modest size of on-chip caches makes this less detrimental, in terms of what you're losing by flushing them. As they get bigger, it will get more noticable, especially for OS performance itself, which REALLY likes big caches, since it has bad locality. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
lance@orc.olivetti.com (Lance Berc) (03/25/89)
The 30usec avg, 60usec worst case i860 cache flushing times did not include the rest of the context switch. I believe that without save/restore of the FPU a full switch is in the 60 - 90 usec range (this is mostly a guess). Using John's 60 - 120 switch/sec guestimate we're still in the 1 - 2 percent range, which should be acceptable in a multitasking situation. Saving and restoring the FPU state is pretty hairy - the manual's example has about one hundred instructions. The time required will depend heavily on the memory subsystem characteristics since there probably won't be any I or D cache hits here. Multitasking number crunching applications probably isn't a good idea unless they are given larger timeslices. No surprise. Lance Berc lance@orc.olivetti.com (415) 496-6248 Olivetti Research Center lance%orc.uucp@unix.sri.com <standard disclaimer>
conte@bach.csg.uiuc.edu (Tom Conte) (03/28/89)
About physical i-caches: it is not clear to me that a context switch may not render lines in even physical i-caches invalid. In a processor where the i-fetch unit is what fills the i-cache, there is a chance that after a context switch the OS will map a different page (of instructions, perhaps) into a physical page frame that has some lines in the i-cache. The pager usually updates (or its data is run through) the d-cache; hence, these updates won't get to the i-cache. I see three ways around this: have the pager flush the i-cache (selectively or always), use a hardware pid with the (*physical*) i-cache, or somehow run paging of instruction pages through the i-cache. Most of the arguments against always-flushing the i-cache are irrelevant anyway since modern on-chip i-caches are too small to preserve any lines of processes between context switches. ------ Tom Conte Computer Systems Group, Coordinated Science Lab University of Illinois, Urbana-Champaign, Illinois ...The opinions expressed are my own, of course. uucp: ...!uiucdcs!uicsrd!conte internet: conte@bach.csg.uiuc.edu
keith@mips.COM (Keith Garrett) (03/30/89)
In article <665@garcon.cso.uiuc.edu> conte@bach.csg.uiuc.edu.UUCP (Tom Conte) writes: >About physical i-caches: it is not clear to me that a context switch >may not render lines in even physical i-caches invalid. In a processor >where the i-fetch unit is what fills the i-cache, there is a chance >that after a context switch the OS will map a different page (of >instructions, perhaps) into a physical page frame that has some lines >in the i-cache. this is a page swap, not a context switch. you have to flush/invalidate both physical and virtual caches (tlb's also) for this, but the frequency should be alot lower. -- Keith Garrett "This is *MY* opinion, OBVIOUSLY" UUCP: keith@mips.com or {ames,decwrl,prls}!mips!keith USPS: Mips Computer Systems,930 Arques Ave,Sunnyvale,Ca. 94086