firth@sei.cmu.edu (Robert Firth) (09/07/88)
In article <58@zeno.MN.ORG> gene@zeno.UUCP (Gene H. Olson) writes: >* The Motorola, Intel, MIPS, SPARC, HP, and IBM RISC > architectures are incredibly similar. In their basic > instruction sets, none of them has any significant > advantages over the other. Sorry, Gene, I'm going to disagree with your first point, and hence with your conclusions. These machines differ in many respects. (a) Some have register window systems. This is a disastrous design error that will ultimately doom them. In particular, the greatly increased context-switch time, and the unpredictability in the cost of a simple procedure call, make register-window machines unsuitable for hard real time applications. (b) Some have elaborate and expensive non-RISC features. One machine in your above list has ADDRESS MODE computation times that can take from 1 to 5 cycles, good grief. Some have those good old "high level language support" instructions that 20 years' experience have proved a total loss. (c) Some have imprecise exception states that make both true recovery semantics and true continuation ('Ada-like') semantics almost impossible to realise. (One gives you precise exception states if you slow the machine down by about 2.5, I believe) (d) Some come with manufacturer-designed procedure calling sequences that are wired into virtually all the system software and hence almost inescapable. They are also gruesomly inefficient. And, of course, I do believe some of the above machines have significant technical advantages over the others. No prizes for guessing which. But we should also remember that technical excellence is not the only thing determining success!
news@amdcad.AMD.COM (Network News) (09/07/88)
In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: | In article <58@zeno.MN.ORG> gene@zeno.UUCP (Gene H. Olson) writes: | | >* The Motorola, Intel, MIPS, SPARC, HP, and IBM RISC | > architectures are incredibly similar. In their basic | > instruction sets, none of them has any significant | > advantages over the other. | | Sorry, Gene, I'm going to disagree with your first point, and hence | with your conclusions. These machines differ in many respects. | | (a) Some have register window systems. This is a disastrous design | error that will ultimately doom them. In particular, the greatly | increased context-switch time, and the unpredictability in the | cost of a simple procedure call, make register-window machines | unsuitable for hard real time applications. Oh, I suppose that by the same reasoning, any machine with caches, virtual memory, or even "page-mode" RAMs is also doomed. Sigh. I guess it's back to the old TMS9900 architecture with no registers to get in the way of that fast context switch and predictability. ;-) How did you measure this "greatly increased context switch time?" There is typically a whole lot more going on during a true context switch than dumping and restoring register contents. In addition, many times it is interrupt latency, not context switch time, that is important. Here, many "register window RISCS" like the Am29000, SPARC, and 80960 have an advantage, in that typically there is a window or reserved register area for the interrupt handler to run in without saving *any* registers. -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)
firth@sei.cmu.edu (Robert Firth) (09/08/88)
In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu I wrote: (a) Some have register window systems. This is a disastrous design error that will ultimately doom them. In particular, the greatly increased context-switch time, and the unpredictability in the cost of a simple procedure call, make register-window machines unsuitable for hard real time applications. In article <22860@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes: Oh, I suppose that by the same reasoning, any machine with caches, virtual memory, or even "page-mode" RAMs is also doomed. Sigh. I guess it's back to the old TMS9900 architecture with no registers to get in the way of that fast context switch and predictability. ;-) Yes, machines with caches do indeed cause problems in implementing hard real time systems; this was brought out in some of the reports of the MIPS assessment funded by RADC. Virtual memory is hardly an issue, since the majority of real time systems do not use it (wisely, in my view). The TI9900 is indeed an example worth studying. It had a context switch time of less than 10usec using early 1970s technology. Last month I attended a presentation of a new "RISC" machine with a 20 MHz clock that couldn't do half as well. Tim continues: How did you measure this "greatly increased context switch time?" There is typically a whole lot more going on during a true context switch than dumping and restoring register contents. In addition, many times it is interrupt latency, not context switch time, that is important. Here, many "register window RISCS" like the Am29000, SPARC, and 80960 have an advantage, in that typically there is a window or reserved register area for the interrupt handler to run in without saving *any* registers. And in response: There is NOT a whole lot more going on during a context switch than the register save and restore. Setting up the dynamic environment for a high-level language task normally implies just changing the registers and restoring any condition codes. A few machines really blow it by having a lot of FPU state (eg the MC68000) or by requiring tasks to use different memory maps (1750a), but on clean machines the major part of the work is the save and restore of the on-chip registers. The more there are, the longer this takes. The idea of having separate interrupt registers is not new to register window (or RISC) machines; the PE3200 had them 15 years ago. I agree that they are a good idea in some applications. But what true real time systems want, in most cases, is interrupts that change the scheduler state, and hence that are followed by a true context switch. (The PE3200 does very badly here). For most applications, it is not enough just to have fast "in and out" interrupts; you must also have the fast context switch. Even without a parallel register set, you can go a long way by reserving a couple of general registers for the "in and out" interrupt handlers. If, of course, your compiler cooperates. But using the normal register window for interrupts seems crazy: if the interrupt occurs at the wrong call depth (1/4 of the time, say) then responding to it will take several times as long, since 128 (or whatever) registers will be spilled to give it a window of 32, of which it might use 4. This is negative leverage with a vengeance!
garner@gaas.Sun.COM (Robert Garner) (09/09/88)
> But what true real time systems want, in most cases, is interrupts > that change the scheduler state, and hence that are followed by a > true context switch. > ... > But using the normal register window for interrupts seems crazy: > if the interrupt occurs at the wrong call depth > (1/4 of the time, say) then responding to it will take several times as > long, since 128 (or whatever) registers will be spilled to give it a > window of 32, of which it might use 4. This is negative leverage with a > vengeance! assuming that an interrupt always causes a context switch; then, in order to achieve minimal context switching latency, just save a SINGLE window on a context switch (16 registers in SPARC). the rest can be saved later. (note that the work of window saves, which write procedure PC, FP, and stack data into memory, must be accomplished in all architectures at SOME point in time between context switches.) also, the SPARC register windows can be managed differently in a particular real-time application: every other window in SPARC can be marked invalid in the privileged Window Invalid Mask register. this yields "number-of-windows/2" 40-register groups, where each group comprises 32 registers plus 8 trap handler registers dedicated to a real-time task. tasks are protected from each other via the Window Invalid Mask. changing the Current Window Pointer accomplishes a process switch among the active groups. (of course, processes in this scheme are compiled with a "single, traditional register set" model.) - rg
news@amdcad.AMD.COM (Network News) (09/09/88)
In article <67551@sun.uucp> garner@sun.UUCP (Robert Garner) writes: | also, the SPARC register windows can be managed differently in a particular | real-time application: every other window in SPARC can be marked invalid | in the privileged Window Invalid Mask register. this yields | "number-of-windows/2" 40-register groups, where each group comprises | 32 registers plus 8 trap handler registers dedicated to a real-time task. | tasks are protected from each other via the Window Invalid Mask. changing the | Current Window Pointer accomplishes a process switch among the active groups. | (of course, processes in this scheme are compiled with a "single, traditional | register set" model.) This scheme is also present in the Am29000 register model, since the register file can be protected in groups of 16 registers. However, current compilers support only the stack-cache model, since it provides the highest performance in most applications. Do the SPARC compilers support both stack-cache and register-bank calling conventions? -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
aglew@urbsdc.Urbana.Gould.COM (09/09/88)
..> I am very much enjoying the discussion about Real Time issues ..> between Firth, Olson, and Garner. Firth says that register windows are "a disatrous design error that will ultimately doom them". I tend to agree with this for RT systems, and even for conventional systems the evidence is now beginning to indicate that good register allocators with large register sets can beat register windows. But, I don't want to talk about general purpose systems; let's talk about RT. > Oh, I suppose that by the same reasoning, any machine with caches, > virtual memory, or even "page-mode" RAMs is also doomed. Sigh. I guess > it's back to the old TMS9900 architecture with no registers to get in > the way of that fast context switch and predictability. ;-) > >Yes, machines with caches do indeed cause problems in implementing hard >real time systems; this was brought out in some of the reports of the >MIPS assessment funded by RADC. Virtual memory is hardly an issue, since >the majority of real time systems do not use it (wisely, in my view). Having worked in the "soft" real-time market for a bit, listening avidly to the words of my seniors, I have learned some things. First, there do exist dedicated real time systems, hard or soft, that do not need the things that you want on a conventional system, like caches or virtual memory. I conjecture that these are mostly to be found in the low end (eg. factory automation) and high end of the market. However, in the middle range of the market, minicomputers and super-minis, there are a lot of people who want *BOTH* real-time and conventional performance. Some do not want it at the same time - eg. a computer site that "officially" buys a computer to run a simulation that takes over all the computers on site for maybe a few hours to a day per month -- but also wants to use the computers that run this dedicated simulation for regular engineering and office work during the rest of the time. Others want RT and conventional capabilities at the same time, because somebody has to develop on the machine, make reports, etc., and the company doesn't want the hassle of handling two totally different development and target systems. Different configurations, perhaps, but even that isn't always acceptable. Finally, there is the class of customers that I see on the horizon, that wants RT as an aspect of conventional systems - ie. where conventional is the emphasis, not RT. Eg. "I want RT UNIX on my workstation" - so I can control sensors scattered through my house, etc. The RT part may be on a separate processor, or even in a separate box, but I want it seamlessly integrated with my normal computing environment. The first two classes of conventional+RT customers I know to be real; the second is projection. In this situation, what we need is not a machine that throws away conventional features like cache and virtual memory, but one that makes it possible to get those things out of the way. >also, the SPARC register windows can be managed differently in a particular >real-time application: every other window in SPARC can be marked invalid >in the privileged Window Invalid Mask register. this yields >"number-of-windows/2" 40-register groups, where each group comprises >32 registers plus 8 trap handler registers dedicated to a real-time task. >tasks are protected from each other via the Window Invalid Mask. changing the >Current Window Pointer accomplishes a process switch among the active groups. >(of course, processes in this scheme are compiled with a "single, traditional >register set" model.) This mode of using register windows is one of the most attractive to the RT side of me. Note that the AMD29000 can do it this way too. Ie. you are basically using the register windows as non-overlapping disjoint register files. But then the conventional side of me takes over. Larger register files, for windows or other, implies slower registers. Is it worthwhile? Probably not. Disjoint register files at least imply the possibility of powering off some sections of the file - although I suppose that it could be done for register windows. The really big win will come if someone makes a register file of N sets of M registers, that can turn off or disable or not require address lines to the inactive N-1 sets, so that the active set of M registers runs at a speed comparable to that of register file that only has M regs.
robert@beatnix.UUCP (Robert Olson) (09/09/88)
ELXSI sells a high end multiprocessor into the realtime marketplace. By high end I mean VAX MIPS performance from 7 MIPS to 250 MIPS, up to 2 GB memory and so forth. By realtime I mean event driven, with frame times of perhaps as little as 250 microseconds, although most customers are running frame times of 5 milliseconds to 20 milliseconds. Many of the issues you raise in your note are ones which we encounter with our customers. In article <6930@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: >In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu I wrote: > > (a) Some have register window systems. This is a disastrous design > error that will ultimately doom them. In particular, the greatly > increased context-switch time, and the unpredictability in the > cost of a simple procedure call, make register-window machines > unsuitable for hard real time applications. Predictability of response times (jitter) is crucial for most of the applications we run. In general the computer is running some mathematical approximation of the real world. The application developers generally make their codes consume 90% - 95% of the cycles in the frame. Jitter must be taken out of the cycles available to the application. Hence, in realtime design you assume the worst case jitter, even if it only happens once an hour or so. Those (mostly) wasted cycles give the application developer heartburn. > >In article <22860@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes: > > Oh, I suppose that by the same reasoning, any machine with caches, > virtual memory, or even "page-mode" RAMs is also doomed. Sigh. I guess > it's back to the old TMS9900 architecture with no registers to get in > the way of that fast context switch and predictability. ;-) > >Yes, machines with caches do indeed cause problems in implementing hard >real time systems; this was brought out in some of the reports of the >MIPS assessment funded by RADC. Virtual memory is hardly an issue, since >the majority of real time systems do not use it (wisely, in my view). In the ELXSI architecture there is only virtual memory, in the sense that the instruction set only allows memory references relative to your process' page map. We do allow you to freeze down pieces of your address space in main memory and, for that matter, in the cache. The cache on the 6460 CPU is 1 MB and can be partitioned among several processes in a static fashion, although the default is for all processes to share the cache. While some of our crustier users find virtual memory concepts disturbingly avant-garde, the ability to freeze things in the cache and main memory makes them feel better. There are substantial advantages to the protection from unplanned "interprocess communication" (i.e., wild writes into unintentionally shared memory). I speak for the company when I say that our customers do very time critical applications while using virtual memory. Like any tool, you need to understand the implications of using it and the ways to overcome the negative side effects for your application. > >The TI9900 is indeed an example worth studying. It had a context switch >time of less than 10usec using early 1970s technology. Last month I >attended a presentation of a new "RISC" machine with a 20 MHz clock that >couldn't do half as well. On the 6460, the context switch time is about 3 microseconds. Total response time to an external interrupt, including a context switch, is about 10 microseconds. If you mutter the right incantations, that can be guaranteed response time, even with timesharing going on in other CPUs. One of the secrets (actually, not so secret) is the use of sixteen process context register sets on the CPU. There is a simple strict priority driven scheduler to manage those register sets, unconditionally running the highest priority task. Context switch involves running the scheduler, settling the state of the CPU from the current process, and selecting the other set of registers. Needless to say, we are pretty proud of these numbers in a large scale system. > >Tim continues: > > How did you measure this "greatly increased context switch time?" There > is typically a whole lot more going on during a true context switch than > dumping and restoring register contents. In addition, many times it is > interrupt latency, not context switch time, that is important. Here, > many "register window RISCS" like the Am29000, SPARC, and 80960 have an > advantage, in that typically there is a window or reserved register area > for the interrupt handler to run in without saving *any* registers. > Virtually all of our customers run multiprocess simulations. Many of them are doing flight simulators. One development team will simulate the engines, one group will interface to the cockpit controls, one group will simulate the flight computer(s), and so forth. Sometimes the black boxes are real ones, hooked up over 1553 or similar external busses, sometimes they are software simulations. Efficient context switch is essential to their application. Every cycle counts, and we look for ways to avoid saving anything that doesn't absolutely need saving. > >And in response: > >There is NOT a whole lot more going on during a context switch than the >register save and restore. Setting up the dynamic environment for a >high-level language task normally implies just changing the registers >and restoring any condition codes. A few machines really blow it by >having a lot of FPU state (eg the MC68000) or by requiring tasks to >use different memory maps (1750a), but on clean machines the major part >of the work is the save and restore of the on-chip registers. The more >there are, the longer this takes. > I agree with this statement. (Incidentally, we do not have condition codes, although there is a status word to be saved.) It is possible for realtime users to have both a modern computer and get their job done. We offer access to realtime from Unix, we support virtual memory, the operating system is message driven rather than shared memory, people program in Pascal, Fortran, C and Ada and so forth. What you have to do is give the realtime user the ability to guarantee certain attributes of his environment, such as memory access times, device access times and so forth. While there are things we still have to do to improve our abilities this way, I think the number of successful applications which have been built using our equipment is proof that important, demanding applications can take advantage of many of the advances readers of this group have developed in the last decade.
rpw3@amdcad.AMD.COM (Rob Warnock) (09/10/88)
In article <6930@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: +--------------- | Even without a parallel register set, you can go a long way by reserving a | couple of general registers for the "in and out" interrupt handlers. If, | of course, your compiler cooperates. But using the normal register window | for interrupts seems crazy: if the interrupt occurs at the wrong call depth | (1/4 of the time, say) then responding to it will take several times as | long, since 128 (or whatever) registers will be spilled to give it a | window of 32, of which it might use 4. This is negative leverage with a | vengeance! +--------------- Well, I don't know what machine you have in mind, but for the Am29000 (which has 128 "local" registers) it doesn't work that way. The 29k has completely *variable*-sized register windows, and you spill exactly what is needed. Thus, an interrupt sequence which uses 4 local registers will spill/fill (save/restore) exactly 4 of them, and an interrupt sequence which uses 37 registers (because of subroutine call depth or whatever) will save/restore exactly 37. It is important to note that because of the variable window size, for normal subroutine calls fills are *not* paired dynamically with spills, but occur only when needed, giving a 128-word "hysteresis" in the spilling and filling. The same is almost true for interrupts, except that on returning from an interrupt you must do a final fill at the end which restores the register file to the state it had on entry. (As it turns out, this is automatic due to a trick in the way the registers are set up on entry to the interrupt. Details posted upon request...) *As an optimization*, the software designer may choose to explicitly pre-spill some number of registers on every interrupt, thus trading off the costs of the explicit save/restore versus the slightly higher overhead of the implicit spill/fill mechanism when only a small number of registers is needed. This pre-spill is not mandated by the hardware, but is something you might do while tuning a completed system. It decreases the average interrupt overhead, leaves worst-case the same, and may [or may not -- it depends] slightly increase the minimum overhead. (The same tuning can be applied to system calls, if desired, and both forms have been used in the System-V and 4.3 ports to the 29000.) While a large register file *does* increase full context-switch time somewhat (but not as badly as you might fear, given that you have load/store-multiple instructions and burst-capable memories), a variable-sized register window such as used in the Am29000 (similar to the original Berkeley RISC's registers) can provide *excellent* interrupt and subroutine-call performance, enough so to more than make up for the increased context-switch time. This also does mean that it is better to run critical real-time code as an interrupt rather than as a heavy-weight process, ...but this has always been true. Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
bwong@sundc.UUCP (Brian Wong) (09/11/88)
A previous poster voiced his opinion that slow context switch time would doom any register-window RISC machine. While I can certainly agree that having a lot of cpu state will certainly make it much harder to make a good hard real time machine, it is not at all clear that a "good hard real time machine" will necessarily be what a large part of the [workstation,minicomputer,pc,minisuper] will want. -- Brian Wong Sun Microsystems bwong@sun.com Vienna, Va. 703-883-1243
blackman@eniac.seas.upenn.edu (David Blackman) (09/11/88)
In article <5692@sundc.UUCP> bwong@sundc.UUCP (Brian Wong) writes: > >While I can certainly agree that having a lot of cpu state will certainly >make it much harder to make a good hard real time machine, it is not at all >clear that a "good hard real time machine" will necessarily be what a large >part of the [workstation,minicomputer,pc,minisuper] will want. > I don't know if you are objecting to "good", "hard", or "real-time" but I would argue that you certainly want "good" and "real-time" machines. Real time, predictable performance is one of the most important advantages that a workstation affords. The large variance in response time on normal time sharing computers was one of the factors which inspired the development of workstations. Jim Morris from Xerox PARC said one of the advantages of the Alto is that is doesn't run faster at night. This point seems to have ignored/overlooked by most workstation manufacturers. For example, I use a diskless workstation whose file system is stored on another workstation. My response times are highly dependent on the load of the file system workstation. The workstation offers the potential of allowing users [NOT "kernel hackers"] to write software that requires response time in the range of 100 us - 1 ms. This was impossible with conventional time sharing computers. You may be using a remote procedure call. You may have written your own driver for a serial I/O port. You may have just interfaced a CD-ROM player to your workstation and are writing a driver for it. You may be trying to drive a 60 ppm laser printer. You may be trying to send/ receive speech over a network in real time. Or, you may be experimenting with a new network protocol. In all cases, the system must provide the facilities for users to write software that has high performance, can keep up with most external devices and events, and have uniform response time. This sounds like real-time to me. Blackman@eniac.seas.upenn.edu
bwong@sundc.UUCP (Brian Wong) (09/12/88)
In article <5116@netnews.upenn.edu>, blackman@eniac.seas.upenn.edu (David Blackman) writes: > > I don't know if you are objecting to "good", "hard", or "real-time" but > I would argue that you certainly want "good" and "real-time" machines. > > Real time, predictable performance is one of the most important advantages > that a workstation affords. The large variance in response time on normal > time sharing computers was one of the factors which inspired the development > of workstations. ... one of the advantages of the > Alto is that is doesn't run faster at night... [... stuff deleted...] > In all cases, the system must provide the facilities for users to > write software that has high performance, can keep up with most external > devices and events, and have uniform response time. This sounds like > real-time to me. > [I've edited down] Perhaps I was asleep during my college classes, but to me, realtime !nessarily= highPerformance. Quick perceptual response, and high performance in general are certainly goals for all workstation design engineers. But I don't think that the (strict) requirements of real time are necessary in the general case. Don't get me wrong, I'm not trying to say that realtime isn't necessary. Just that it's overkill in a whole lot of situations, and that perhaps the engineering decisions involved in designing hardware/software shouldn't always be weighted toward realtime. -- Brian Wong Sun Microsystems bwong@sun.com Vienna, Va. 703-883-1243
robert@beatnix.UUCP (Robert Olson) (09/12/88)
> However, in the middle range of the market, minicomputers and super-minis, >there are a lot of people who want *BOTH* real-time and conventional >performance. Some do not want it at the same time - eg. a computer site >that "officially" buys a computer to run a simulation that takes over all >the computers on site for maybe a few hours to a day per month -- but also >wants to use the computers that run this dedicated simulation for regular >engineering and office work during the rest of the time. > Others want RT and conventional capabilities at the same time, because >somebody has to develop on the machine, make reports, etc., and the company >doesn't want the hassle of handling two totally different development and >target systems. Different configurations, perhaps, but even that isn't >always acceptable. A large number of our customers fall into the above categories. >> ... stuff about cute tricks with SPARC register windows... > >This mode of using register windows is one of the most attractive to the RT >side of me. Note that the AMD29000 can do it this way too. Ie. you are basically >using the register windows as non-overlapping disjoint register files. > But then the conventional side of me takes over. Larger register files, >for windows or other, implies slower registers. Is it worthwhile? Probably not. >Disjoint register files at least imply the possibility of powering off some >sections of the file - although I suppose that it could be done for >register windows. The really big win will come if someone makes a register >file of N sets of M registers, that can turn off or disable or not require >address lines to the inactive N-1 sets, so that the active set of M registers >runs at a speed comparable to that of register file that only has M regs. There are a couple of interesting things about our new CPU which were designed specifically for the customers described above. First, the megabyte cache is partitionable among up to 8 processes (actually, slightly more complicated than that.) It is a direct mapped cache so that the interested realtime programmer can statically allocate his data, if so desired. ("hard" realtime people eat nails for lunch.) The scheduling issues were discussed in my previous note. The other thing, the implications of which I don't yet fully understand, is that access time to data in cache is the same as access to data in a register. The instruction set allows one of the source operands to be a generalized address of the usual sort - i.e., base, base + displacement, base + index plus displacement, and so forth. The access time for data in the cache is one cycle, without regard for the complexity of the address mode. Since everything (practically every instruction) is one cycle, you are encouraged to use the most complex address modes and most powerful instructions that make sense, as they squeeze out RISCish instructions that would consume extra cycles. They also seem to reduce the demand for registers by reducing the penalty for not having something in a register, although there is still a penalty. Naturally, these are expensive RAMs and an expensive CPU. My point is that, in our CPU at least, there are interesting things going on in the never-ending war between the levels of the memory heirarchy.
beyer@houxs.UUCP (J.BEYER) (09/12/88)
In article <5708@sundc.UUCP>, bwong@sundc.UUCP (Brian Wong) writes: > > Perhaps I was asleep during my college classes, but to me, > realtime !nessarily= highPerformance. Quick perceptual response, and high > performance in general are certainly goals for all workstation design > engineers. But I don't think that the (strict) requirements of real time > are necessary in the general case. What I learned about in designing real-time systems (which I haven't done for many years now) is that the results must be available SOON ENOUGH. Whether this is seconds or nanoseconds depended upon the application. If a machine were too fast, software could always delay the presentation of the results until the load was able to absorb it. Of course, there are better and worse ways to provided the needed delay (if there were a need to delay an early output at all). -- Jean-David Beyer A.T.&T., Holmdel, New Jersey, 07733 houxs!beyer
koopman@a.gp.cs.cmu.edu (Philip Koopman) (09/12/88)
In article <5116@netnews.upenn.edu>, blackman@eniac.seas.upenn.edu (David Blackman) writes: > Real time, predictable performance is one of the most important advantages > that a workstation affords. The large variance in response time on normal > time sharing computers was one of the factors which inspired the development > of workstations. ...... > The workstation offers the potential of allowing users [NOT "kernel > hackers"] to write software that requires response time in the range of > 100 us - 1 ms. This was impossible with conventional time sharing computers. I agree that real time control demands predictable performance. However, there are different time scales involved here. For events that don't require more than a couple of instructions at the 1 ms time-scale, you're right, workstations and the RISC chips do just fine. However, for tighter time-tables and more processing, most workstations aren't quick enough. Down below a certain threshold, cache misses, pipeline breaks, etc. can't be averaged out into a "MIPS rating". If you must respond to an interrupt with a fairly complex task within 100 us, that gives you 1000 clocks at 10 MHz. If you have only 10-20 instructions, you're all set. If you have 700-900 instructions to process within that timeframe, unpredictablility at a fine-grain level (i.e. cache misses based on what task you were running last, branch target table hits/misses, etc.) will eat you alive! A predictable, consistent machine at 10 MIPS may be worth a whole lot more than a machine that bursts at 40-50 MIPS in a real-time control environment. Average performance is a useless figure in this case. What matters is absolute worst-case performance when meeting deadlines. Phil Koopman koopman@maxwell.ece.cmu.edu Arpanet 5551 Beacon St. Pittsburgh, PA 15217 PhD student at CMU and sometime consultant to Harris Semiconductor.
paul@unisoft.UUCP (n) (09/12/88)
In article <22890@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes: >+--------------- > >Well, I don't know what machine you have in mind, but for the Am29000 >(which has 128 "local" registers) it doesn't work that way. The 29k >has completely *variable*-sized register windows, and you spill exactly >what is needed. Thus, an interrupt sequence which uses 4 local registers >will spill/fill (save/restore) exactly 4 of them, and an interrupt sequence >which uses 37 registers (because of subroutine call depth or whatever) >will save/restore exactly 37. > ..... > >Rob Warnock >Systems Architecture Consultant > Rob of course didn't tell you how long it actually takes to burst transfer all 192 registers to memory (if you really do have to save them all ....) at 30MHz (33nS/cycle) 192*0.033 -> 6.3 uS (6.4uS actually if you count a 2-3 cycle burst setup time) not too bad!! the typical time a kernel spends looking for the next process to execute plus and changing the memory map on process switch easily dwarfs this (hell interrupt acknowledge time on most modern buses is around 1uS). Maybe a 4-5 years from now this will become a big issue but by then the silicon will be that much faster anyway Paul Campbell -- Paul Campbell, UniSoft Corp. 6121 Hollis, Emeryville, Ca E-mail: ..!{ucbvax,hoptoad}!unisoft!paul Nothing here represents the opinions of UniSoft or its employees (except me) "Nuclear war doesn't prove who's Right, just who's Left" (ABC news 10/13/87)
eugene@eos.UUCP (Eugene Miya) (09/13/88)
In ACM SIGPLAN Notices, vol. 17, no. 9, Sept. 1982, Alan Perlis wrote in the article Epigrams on Programming: > You can measure a programmer's perspective by noting his attitude on the > continuing vitality of FORTRAN. I say: You can measure a person's perspective by noting whether he thinks a VAX is a "mainframe." You are welcome to make other bumper sticker computer science (as Bentley calls it). Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene