[comp.arch] Are all RISCs the same?

firth@sei.cmu.edu (Robert Firth) (09/07/88)

In article <58@zeno.MN.ORG> gene@zeno.UUCP (Gene H. Olson) writes:

>*	The Motorola, Intel, MIPS, SPARC, HP, and IBM RISC
>	architectures are incredibly similar.  In their basic
>	instruction sets, none of them has any significant
>	advantages over the other.

Sorry, Gene, I'm going to disagree with your first point, and hence
with your conclusions.  These machines differ in many respects.

(a) Some have register window systems.  This is a disastrous design
    error that will ultimately doom them.  In particular, the greatly
    increased context-switch time, and the unpredictability in the
    cost of a simple procedure call, make register-window machines
    unsuitable for hard real time applications.

(b) Some have elaborate and expensive non-RISC features.  One machine
    in your above list has ADDRESS MODE computation times that can take
    from 1 to 5 cycles, good grief.  Some have those good old "high level
    language support" instructions that 20 years' experience have proved
    a total loss.

(c) Some have imprecise exception states that make both true recovery
    semantics and true continuation ('Ada-like') semantics almost impossible
    to realise.  (One gives you precise exception states if you slow the
    machine down by about 2.5, I believe)

(d) Some come with manufacturer-designed procedure calling sequences that
    are wired into virtually all the system software and hence almost
    inescapable.  They are also gruesomly inefficient.

And, of course, I do believe some of the above machines have significant
technical advantages over the others.  No prizes for guessing which. But
we should also remember that technical excellence is not the only thing
determining success!

news@amdcad.AMD.COM (Network News) (09/07/88)

In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes:
| In article <58@zeno.MN.ORG> gene@zeno.UUCP (Gene H. Olson) writes:
| 
| >*	The Motorola, Intel, MIPS, SPARC, HP, and IBM RISC
| >	architectures are incredibly similar.  In their basic
| >	instruction sets, none of them has any significant
| >	advantages over the other.
| 
| Sorry, Gene, I'm going to disagree with your first point, and hence
| with your conclusions.  These machines differ in many respects.
| 
| (a) Some have register window systems.  This is a disastrous design
|     error that will ultimately doom them.  In particular, the greatly
|     increased context-switch time, and the unpredictability in the
|     cost of a simple procedure call, make register-window machines
|     unsuitable for hard real time applications.

Oh, I suppose that by the same reasoning, any machine with caches,
virtual memory, or even "page-mode" RAMs is also doomed.  Sigh.  I guess
it's back to the old TMS9900 architecture with no registers to get in
the way of that fast context switch and predictability.  ;-)

How did you measure this "greatly increased context switch time?" There
is typically a whole lot more going on during a true context switch than
dumping and restoring register contents.  In addition, many times it is
interrupt latency, not context switch time, that is important.  Here,
many "register window RISCS" like the Am29000, SPARC, and 80960 have an
advantage, in that typically there is a window or reserved register area
for the interrupt handler to run in without saving *any* registers. 

	-- Tim Olson
	Advanced Micro Devices
	(tim@delirun.amd.com)

firth@sei.cmu.edu (Robert Firth) (09/08/88)

In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu I wrote:

 (a) Some have register window systems.  This is a disastrous design
     error that will ultimately doom them.  In particular, the greatly
     increased context-switch time, and the unpredictability in the
     cost of a simple procedure call, make register-window machines
     unsuitable for hard real time applications.

In article <22860@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes:

  Oh, I suppose that by the same reasoning, any machine with caches,
  virtual memory, or even "page-mode" RAMs is also doomed.  Sigh.  I guess
  it's back to the old TMS9900 architecture with no registers to get in
  the way of that fast context switch and predictability.  ;-)

Yes, machines with caches do indeed cause problems in implementing hard
real time systems; this was brought out in some of the reports of the
MIPS assessment funded by RADC.  Virtual memory is hardly an issue, since
the majority of real time systems do not use it (wisely, in my view).

The TI9900 is indeed an example worth studying.  It had a context switch
time of less than 10usec using early 1970s technology.  Last month I
attended a presentation of a new "RISC" machine with a 20 MHz clock that
couldn't do half as well.

Tim continues:

  How did you measure this "greatly increased context switch time?" There
  is typically a whole lot more going on during a true context switch than
  dumping and restoring register contents.  In addition, many times it is
  interrupt latency, not context switch time, that is important.  Here,
  many "register window RISCS" like the Am29000, SPARC, and 80960 have an
  advantage, in that typically there is a window or reserved register area
  for the interrupt handler to run in without saving *any* registers. 


And in response:

There is NOT a whole lot more going on during a context switch than the
register save and restore.  Setting up the dynamic environment for a
high-level language task normally implies just changing the registers
and restoring any condition codes.  A few machines really blow it by
having a lot of FPU state (eg the MC68000) or by requiring tasks to
use different memory maps (1750a), but on clean machines the major part
of the work is the save and restore of the on-chip registers.  The more
there are, the longer this takes.

The idea of having separate interrupt registers is not new to register
window (or RISC) machines; the PE3200 had them 15 years ago.  I agree that
they are a good idea in some applications.  But what true real time systems
want, in most cases, is interrupts that change the scheduler state, and
hence that are followed by a true context switch.  (The PE3200 does very
badly here).  For most applications, it is not enough just to have fast
"in and out" interrupts; you must also have the fast context switch.

Even without a parallel register set, you can go a long way by reserving a
couple of general registers for the "in and out" interrupt handlers.  If,
of course, your compiler cooperates.  But using the normal register window
for interrupts seems crazy: if the interrupt occurs at the wrong call depth
(1/4 of the time, say) then responding to it will take several times as
long, since 128 (or whatever) registers will be spilled to give it a
window of 32, of which it might use 4.  This is negative leverage with a
vengeance!

garner@gaas.Sun.COM (Robert Garner) (09/09/88)

>  But what true real time systems want, in most cases, is interrupts
>  that change the scheduler state, and hence that are followed by a
>  true context switch.
>  ...
>  But using the normal register window for interrupts seems crazy:
>  if the interrupt occurs at the wrong call depth
>  (1/4 of the time, say) then responding to it will take several times as
>  long, since 128 (or whatever) registers will be spilled to give it a
>  window of 32, of which it might use 4.  This is negative leverage with a
>  vengeance!

assuming that an interrupt always causes a context switch; then, in
order to achieve minimal context switching latency, just save a SINGLE window
on a context switch (16 registers in SPARC).  the rest can be saved later.
(note that the work of window saves, which write procedure PC,
FP, and stack data into memory, must be accomplished in all architectures
at SOME point in time between context switches.)

also, the SPARC register windows can be managed differently in a particular
real-time application:  every other window in SPARC can be marked invalid
in the privileged Window Invalid Mask register.  this yields
"number-of-windows/2" 40-register groups, where each group comprises
32 registers plus 8 trap handler registers dedicated to a real-time task.
tasks are protected from each other via the Window Invalid Mask.  changing the
Current Window Pointer accomplishes a process switch among the active groups.
(of course, processes in this scheme are compiled with a "single, traditional
register set" model.)

	- rg
 

news@amdcad.AMD.COM (Network News) (09/09/88)

In article <67551@sun.uucp> garner@sun.UUCP (Robert Garner) writes:
| also, the SPARC register windows can be managed differently in a particular
| real-time application:  every other window in SPARC can be marked invalid
| in the privileged Window Invalid Mask register.  this yields
| "number-of-windows/2" 40-register groups, where each group comprises
| 32 registers plus 8 trap handler registers dedicated to a real-time task.
| tasks are protected from each other via the Window Invalid Mask.  changing the
| Current Window Pointer accomplishes a process switch among the active groups.
| (of course, processes in this scheme are compiled with a "single, traditional
| register set" model.)

This scheme is also present in the Am29000 register model, since the
register file can be protected in groups of 16 registers.  However,
current compilers support only the stack-cache model, since it provides
the highest performance in most applications.

Do the SPARC compilers support both stack-cache and register-bank
calling conventions?


	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)

aglew@urbsdc.Urbana.Gould.COM (09/09/88)

..> I am very much enjoying the discussion about Real Time issues
..> between Firth, Olson, and Garner. 

Firth says that register windows are "a disatrous design error that
will ultimately doom them". I tend to agree with this for RT systems,
and even for conventional systems the evidence is now beginning to 
indicate that good register allocators with large register sets 
can beat register windows.
    But, I don't want to talk about general purpose systems;
let's talk about RT.

>  Oh, I suppose that by the same reasoning, any machine with caches,
>  virtual memory, or even "page-mode" RAMs is also doomed.  Sigh.  I guess
>  it's back to the old TMS9900 architecture with no registers to get in
>  the way of that fast context switch and predictability.  ;-)
>
>Yes, machines with caches do indeed cause problems in implementing hard
>real time systems; this was brought out in some of the reports of the
>MIPS assessment funded by RADC.  Virtual memory is hardly an issue, since
>the majority of real time systems do not use it (wisely, in my view).

Having worked in the "soft" real-time market for a bit, listening avidly
to the words of my seniors, I have learned some things. First, there do
exist dedicated real time systems, hard or soft, that do not need the 
things that you want on a conventional system, like caches or virtual
memory. I conjecture that these are mostly to be found in the low end
(eg. factory automation) and high end of the market.
    However, in the middle range of the market, minicomputers and super-minis,
there are a lot of people who want *BOTH* real-time and conventional 
performance. Some do not want it at the same time - eg. a computer site
that "officially" buys a computer to run a simulation that takes over all
the computers on site for maybe a few hours to a day per month -- but also
wants to use the computers that run this dedicated simulation for regular
engineering and office work during the rest of the time.
    Others want RT and conventional capabilities at the same time, because
somebody has to develop on the machine, make reports, etc., and the company
doesn't want the hassle of handling two totally different development and
target systems. Different configurations, perhaps, but even that isn't
always acceptable.
    Finally, there is the class of customers that I see on the horizon,
that wants RT as an aspect of conventional systems - ie. where conventional
is the emphasis, not RT. Eg. "I want RT UNIX on my workstation" - so I can
control sensors scattered through my house, etc. The RT part may be on a
separate processor, or even in a separate box, but I want it seamlessly
integrated with my normal computing environment.
    The first two classes of conventional+RT customers I know to be real;
the second is projection.

In this situation, what we need is not a machine that throws away conventional
features like cache and virtual memory, but one that makes it possible to get
those things out of the way.

>also, the SPARC register windows can be managed differently in a particular
>real-time application:  every other window in SPARC can be marked invalid
>in the privileged Window Invalid Mask register.  this yields
>"number-of-windows/2" 40-register groups, where each group comprises
>32 registers plus 8 trap handler registers dedicated to a real-time task.
>tasks are protected from each other via the Window Invalid Mask.  changing the
>Current Window Pointer accomplishes a process switch among the active groups.
>(of course, processes in this scheme are compiled with a "single, traditional
>register set" model.)

This mode of using register windows is one of the most attractive to the RT 
side of me. Note that the AMD29000 can do it this way too. Ie. you are basically
using the register windows as non-overlapping disjoint register files.
    But then the conventional side of me takes over. Larger register files,
for windows or other, implies slower registers. Is it worthwhile? Probably not.
Disjoint register files at least imply the possibility of powering off some
sections of the file - although I suppose that it could be done for 
register windows. The really big win will come if someone makes a register
file of N sets of M registers, that can turn off or disable or not require
address lines to the inactive N-1 sets, so that the active set of M registers
runs at a speed comparable to that of register file that only has M regs.

robert@beatnix.UUCP (Robert Olson) (09/09/88)

ELXSI sells a high end multiprocessor into the realtime marketplace.  By high
end I mean VAX MIPS performance from 7 MIPS to 250 MIPS, up to 2 GB memory
and so forth.  By realtime I mean event driven, with frame times of perhaps
as little as 250 microseconds, although most customers are running frame times
of 5 milliseconds to 20 milliseconds.  Many of the issues you raise in your
note are ones which we encounter with our customers.

In article <6930@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes:
>In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu I wrote:
>
> (a) Some have register window systems.  This is a disastrous design
>     error that will ultimately doom them.  In particular, the greatly
>     increased context-switch time, and the unpredictability in the
>     cost of a simple procedure call, make register-window machines
>     unsuitable for hard real time applications.
Predictability of response times (jitter) is crucial for most of the 
applications we run.  In general the computer is running some mathematical
approximation of the real world.  The application developers generally  
make their codes consume 90% - 95% of the cycles in the frame.  Jitter must
be taken out of the cycles available to the application.  Hence, in realtime
design you assume the worst case jitter, even if it only happens once an
hour or so.  Those (mostly) wasted cycles give the application developer
heartburn.

>
>In article <22860@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes:
>
>  Oh, I suppose that by the same reasoning, any machine with caches,
>  virtual memory, or even "page-mode" RAMs is also doomed.  Sigh.  I guess
>  it's back to the old TMS9900 architecture with no registers to get in
>  the way of that fast context switch and predictability.  ;-)
>
>Yes, machines with caches do indeed cause problems in implementing hard
>real time systems; this was brought out in some of the reports of the
>MIPS assessment funded by RADC.  Virtual memory is hardly an issue, since
>the majority of real time systems do not use it (wisely, in my view).

In the ELXSI architecture there is only virtual memory, in the sense that 
the instruction set only allows memory references relative to your process'
page map.  We do allow you to freeze down pieces of your address space in
main memory and, for that matter, in the cache.  The cache on the 6460
CPU is 1 MB and can be partitioned among several processes in a static
fashion, although the default is for all processes to share the cache.
While some of our crustier users find virtual memory concepts disturbingly
avant-garde, the ability to freeze things in the cache and main memory
makes them feel better.  There are substantial advantages to the protection
from unplanned "interprocess communication" (i.e., wild writes into 
unintentionally shared memory).  I speak for the company when I say that our
customers do very time critical applications while using virtual memory.  Like
any tool, you need to understand the implications of using it and the ways to
overcome the negative side effects for your application.

>
>The TI9900 is indeed an example worth studying.  It had a context switch
>time of less than 10usec using early 1970s technology.  Last month I
>attended a presentation of a new "RISC" machine with a 20 MHz clock that
>couldn't do half as well.

On the 6460, the context switch time is about 3 microseconds.  Total response
time to an external interrupt, including a context switch, is about 10 
microseconds.  If you mutter the right incantations, that can be guaranteed
response time, even with timesharing going on in other CPUs.  One of the 
secrets (actually, not so secret) is the use of sixteen process context
register sets on the CPU.  There is a simple strict priority driven scheduler
to manage those register sets, unconditionally running the highest priority
task.  Context switch involves running the scheduler, settling the state of
the CPU from the current process, and selecting the other set of registers.
Needless to say, we are pretty proud of these numbers in a large scale system.

>
>Tim continues:
>
>  How did you measure this "greatly increased context switch time?" There
>  is typically a whole lot more going on during a true context switch than
>  dumping and restoring register contents.  In addition, many times it is
>  interrupt latency, not context switch time, that is important.  Here,
>  many "register window RISCS" like the Am29000, SPARC, and 80960 have an
>  advantage, in that typically there is a window or reserved register area
>  for the interrupt handler to run in without saving *any* registers. 
>
Virtually all of our customers run multiprocess simulations.  Many of them are
doing flight simulators.  One development team will simulate the engines, one
group will interface to the cockpit controls, one group will simulate the 
flight computer(s), and so forth.  Sometimes the black boxes are real ones, 
hooked up over 1553 or similar external busses, sometimes they are software 
simulations.  Efficient context switch is essential to their application.
Every cycle counts, and we look for ways to avoid saving anything that doesn't
absolutely need saving.

>
>And in response:
>
>There is NOT a whole lot more going on during a context switch than the
>register save and restore.  Setting up the dynamic environment for a
>high-level language task normally implies just changing the registers
>and restoring any condition codes.  A few machines really blow it by
>having a lot of FPU state (eg the MC68000) or by requiring tasks to
>use different memory maps (1750a), but on clean machines the major part
>of the work is the save and restore of the on-chip registers.  The more
>there are, the longer this takes.
>

I agree with this statement.  (Incidentally, we do not have condition codes,
although there is a status word to be saved.)


It is possible for realtime users to have both a modern computer and get their
job done.  We offer access to realtime from Unix, we support virtual memory,
the operating system is message driven rather than shared memory, people 
program in Pascal, Fortran, C and Ada and so forth.  What you have to do
is give the realtime user the ability to guarantee certain attributes of his
environment, such as memory access times, device access times and so forth.
While there are things we still have to do to improve our abilities this 
way, I think the number of successful applications which have been built using
our equipment is proof that important, demanding applications can take 
advantage of many of the advances readers of this group have developed in the
last decade.

rpw3@amdcad.AMD.COM (Rob Warnock) (09/10/88)

In article <6930@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes:
+---------------
| Even without a parallel register set, you can go a long way by reserving a
| couple of general registers for the "in and out" interrupt handlers.  If,
| of course, your compiler cooperates.  But using the normal register window
| for interrupts seems crazy: if the interrupt occurs at the wrong call depth
| (1/4 of the time, say) then responding to it will take several times as
| long, since 128 (or whatever) registers will be spilled to give it a
| window of 32, of which it might use 4.  This is negative leverage with a
| vengeance!
+---------------

Well, I don't know what machine you have in mind, but for the Am29000
(which has 128 "local" registers) it doesn't work that way. The 29k
has completely *variable*-sized register windows, and you spill exactly
what is needed.  Thus, an interrupt sequence which uses 4 local registers
will spill/fill (save/restore) exactly 4 of them, and an interrupt sequence
which uses 37 registers (because of subroutine call depth or whatever)
will save/restore exactly 37.

It is important to note that because of the variable window size, for
normal subroutine calls fills are *not* paired dynamically with spills,
but occur only when needed, giving a 128-word "hysteresis" in the spilling
and filling. The same is almost true for interrupts, except that on returning
from an interrupt you must do a final fill at the end which restores the
register file to the state it had on entry. (As it turns out, this is
automatic due to a trick in the way the registers are set up on entry
to the interrupt.  Details posted upon request...)

*As an optimization*, the software designer may choose to explicitly
pre-spill some number of registers on every interrupt, thus trading
off the costs of the explicit save/restore versus the slightly higher
overhead of the implicit spill/fill mechanism when only a small number
of registers is needed. This pre-spill is not mandated by the hardware,
but is something you might do while tuning a completed system. It
decreases the average interrupt overhead, leaves worst-case the same,
and may [or may not -- it depends] slightly increase the minimum overhead.
(The same tuning can be applied to system calls, if desired, and both
forms have been used in the System-V and 4.3 ports to the 29000.)

While a large register file *does* increase full context-switch time somewhat
(but not as badly as you might fear, given that you have load/store-multiple
instructions and burst-capable memories), a variable-sized register window
such as used in the Am29000 (similar to the original Berkeley RISC's registers)
can provide *excellent* interrupt and subroutine-call performance, enough so
to more than make up for the increased context-switch time. This also does
mean that it is better to run critical real-time code as an interrupt rather
than as a heavy-weight process, ...but this has always been true.


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

bwong@sundc.UUCP (Brian Wong) (09/11/88)

A previous poster voiced his opinion that slow context switch time would
doom any register-window RISC machine.

While I can certainly agree that having a lot of cpu state will certainly
make it much harder to make a good hard real time machine, it is not at all
clear that a "good hard real time machine" will necessarily be what a large
part of the [workstation,minicomputer,pc,minisuper] will want.  

-- 
Brian Wong					Sun Microsystems
bwong@sun.com					Vienna, Va.  703-883-1243

blackman@eniac.seas.upenn.edu (David Blackman) (09/11/88)

In article <5692@sundc.UUCP> bwong@sundc.UUCP (Brian Wong) writes:
>
>While I can certainly agree that having a lot of cpu state will certainly
>make it much harder to make a good hard real time machine, it is not at all
>clear that a "good hard real time machine" will necessarily be what a large
>part of the [workstation,minicomputer,pc,minisuper] will want.  
>

I don't know if you are objecting to "good", "hard", or "real-time" but
I would argue that you certainly want "good" and "real-time" machines.

Real time, predictable performance is one of the most important advantages
that a workstation affords.  The large variance in response time on normal
time sharing computers was one of the factors which inspired the development
of workstations.  Jim Morris from Xerox PARC said one of the advantages of the
Alto is that is doesn't run faster at night.  This point seems to have
ignored/overlooked by most workstation manufacturers.  For example, I use
a diskless workstation whose file system is stored on another workstation.
My response times are highly dependent on the load of the file system
workstation.  

The workstation offers the potential of allowing users [NOT "kernel
hackers"] to write software that requires response time in the range of
100 us - 1 ms.  This was impossible with conventional time sharing computers.

You may be using a remote procedure call.  You may have written your
own driver for a serial I/O port.  You may have just interfaced a CD-ROM
player to your workstation and are writing a driver for it.  You may
be trying to drive a 60 ppm laser printer.  You may be trying to send/
receive speech over a network in real time.  Or, you may be experimenting
with a new network protocol.

In all cases, the system must provide the facilities for users to
write software that has high performance, can keep up with most external
devices and events, and have uniform response time.  This sounds like
real-time to me.

Blackman@eniac.seas.upenn.edu

bwong@sundc.UUCP (Brian Wong) (09/12/88)

In article <5116@netnews.upenn.edu>, blackman@eniac.seas.upenn.edu (David Blackman) writes:
> 
> I don't know if you are objecting to "good", "hard", or "real-time" but
> I would argue that you certainly want "good" and "real-time" machines.
> 
> Real time, predictable performance is one of the most important advantages
> that a workstation affords.  The large variance in response time on normal
> time sharing computers was one of the factors which inspired the development
> of workstations.  ... one of the advantages of the
> Alto is that is doesn't run faster at night...
[... stuff deleted...]
> In all cases, the system must provide the facilities for users to
> write software that has high performance, can keep up with most external
> devices and events, and have uniform response time.  This sounds like
> real-time to me.
> 

[I've edited down] 

Perhaps I was asleep during my college classes, but to me,
realtime !nessarily= highPerformance.  Quick perceptual response, and high
performance in general are certainly goals for all workstation design
engineers.  But I don't think that the (strict) requirements of real time
are necessary in the general case.

Don't get me wrong, I'm not trying to say that realtime isn't necessary.
Just that it's overkill in a whole lot of situations, and that perhaps the
engineering decisions involved in designing hardware/software shouldn't
always be weighted toward realtime.


-- 
Brian Wong					Sun Microsystems
bwong@sun.com					Vienna, Va.  703-883-1243

robert@beatnix.UUCP (Robert Olson) (09/12/88)

>    However, in the middle range of the market, minicomputers and super-minis,
>there are a lot of people who want *BOTH* real-time and conventional 
>performance. Some do not want it at the same time - eg. a computer site
>that "officially" buys a computer to run a simulation that takes over all
>the computers on site for maybe a few hours to a day per month -- but also
>wants to use the computers that run this dedicated simulation for regular
>engineering and office work during the rest of the time.
>    Others want RT and conventional capabilities at the same time, because
>somebody has to develop on the machine, make reports, etc., and the company
>doesn't want the hassle of handling two totally different development and
>target systems. Different configurations, perhaps, but even that isn't
>always acceptable.

A large number of our customers fall into the above categories.

>> ... stuff about cute tricks with SPARC register windows...
>
>This mode of using register windows is one of the most attractive to the RT 
>side of me. Note that the AMD29000 can do it this way too. Ie. you are basically
>using the register windows as non-overlapping disjoint register files.
>    But then the conventional side of me takes over. Larger register files,
>for windows or other, implies slower registers. Is it worthwhile? Probably not.
>Disjoint register files at least imply the possibility of powering off some
>sections of the file - although I suppose that it could be done for 
>register windows. The really big win will come if someone makes a register
>file of N sets of M registers, that can turn off or disable or not require
>address lines to the inactive N-1 sets, so that the active set of M registers
>runs at a speed comparable to that of register file that only has M regs.

There are a couple of interesting things about our new CPU which were designed
specifically for the customers described above.  First, the megabyte cache is
partitionable among up to 8 processes (actually, slightly more complicated 
than that.)  It is a direct mapped cache so that the interested realtime 
programmer can statically allocate his data, if so desired.  ("hard" realtime
people eat nails for lunch.)  The scheduling issues were discussed in my 
previous note.

The other thing, the implications of which I don't yet fully understand, is
that access time to data in cache is the same as access to data in a register.
The instruction set allows one of the source operands to be a generalized
address of the usual sort - i.e., base, base + displacement, base + index
plus displacement, and so forth.  The access time for data in the cache is
one cycle, without regard for the complexity of the address mode.  Since
everything (practically every instruction) is one cycle, you are encouraged
to use the most complex address modes and most powerful instructions that 
make sense, as they squeeze out RISCish instructions that would consume extra
cycles.  They also seem to reduce the demand for registers by reducing the
penalty for not having something in a register, although there is still a
penalty.  Naturally, these are expensive RAMs and an expensive CPU.  My point
is that, in our CPU at least, there are interesting things going on in the
never-ending war between the levels of the memory heirarchy.

beyer@houxs.UUCP (J.BEYER) (09/12/88)

In article <5708@sundc.UUCP>, bwong@sundc.UUCP (Brian Wong) writes:
> 
> Perhaps I was asleep during my college classes, but to me,
> realtime !nessarily= highPerformance.  Quick perceptual response, and high
> performance in general are certainly goals for all workstation design
> engineers.  But I don't think that the (strict) requirements of real time
> are necessary in the general case.

What I learned about in designing real-time systems (which I haven't done
for many years now) is that the results must be available SOON ENOUGH.
Whether this is seconds or nanoseconds depended upon the application.
If a machine were too fast, software could always delay  the presentation
of the results until the load was able to absorb it. Of course, there are
better and worse ways to provided the needed delay (if there were a need
to delay an early output at all).


-- 
Jean-David Beyer
A.T.&T., Holmdel, New Jersey, 07733
houxs!beyer

koopman@a.gp.cs.cmu.edu (Philip Koopman) (09/12/88)

In article <5116@netnews.upenn.edu>, blackman@eniac.seas.upenn.edu (David Blackman) writes:
> Real time, predictable performance is one of the most important advantages
> that a workstation affords.  The large variance in response time on normal
> time sharing computers was one of the factors which inspired the development
> of workstations.
......
> The workstation offers the potential of allowing users [NOT "kernel
> hackers"] to write software that requires response time in the range of
> 100 us - 1 ms.  This was impossible with conventional time sharing computers.

I agree that real time control demands predictable performance.  However,
there are different time scales involved here.  For events that don't
require more than a couple of instructions at the 1 ms time-scale, you're
right, workstations and the RISC chips do just fine.

However, for tighter time-tables and more processing, most workstations
aren't quick enough.  Down below a certain threshold, cache misses, pipeline
breaks, etc. can't be averaged out into a "MIPS rating".  If you must
respond to an interrupt with a fairly complex task within 100 us, that
gives you 1000 clocks at 10 MHz.  If you have only 10-20 instructions,
you're all set.  If you have 700-900 instructions to process within
that timeframe, unpredictablility at a fine-grain level
(i.e. cache misses based on what task you were running last, branch
target table hits/misses, etc.) will eat you alive!

A predictable, consistent machine at 10 MIPS may be worth a whole
lot more than a machine that bursts at 40-50 MIPS in a real-time
control environment.  Average performance is a useless figure in this case.
What matters is absolute worst-case performance when meeting deadlines.

  Phil Koopman                koopman@maxwell.ece.cmu.edu   Arpanet
  5551 Beacon St.
  Pittsburgh, PA  15217    
PhD student at CMU and sometime consultant to Harris Semiconductor.

paul@unisoft.UUCP (n) (09/12/88)

In article <22890@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes:
>+---------------
>
>Well, I don't know what machine you have in mind, but for the Am29000
>(which has 128 "local" registers) it doesn't work that way. The 29k
>has completely *variable*-sized register windows, and you spill exactly
>what is needed.  Thus, an interrupt sequence which uses 4 local registers
>will spill/fill (save/restore) exactly 4 of them, and an interrupt sequence
>which uses 37 registers (because of subroutine call depth or whatever)
>will save/restore exactly 37.
>
 	.....
>
>Rob Warnock
>Systems Architecture Consultant
>


Rob of course didn't tell you how long it actually takes to burst transfer
all 192 registers to memory (if you really do have to save them all ....)


	at 30MHz (33nS/cycle) 192*0.033 -> 6.3 uS (6.4uS actually if
		you count a 2-3 cycle burst setup time)

not too bad!! the typical time a kernel spends looking for the next process
to execute plus and changing the memory map on process switch easily
dwarfs this (hell interrupt acknowledge time on most modern buses is around
1uS). Maybe a 4-5 years from now this will become a big issue but by then
the silicon will be that much faster anyway

		Paul Campbell


-- 
Paul Campbell, UniSoft Corp. 6121 Hollis, Emeryville, Ca
	E-mail:		..!{ucbvax,hoptoad}!unisoft!paul  
Nothing here represents the opinions of UniSoft or its employees (except me)
"Nuclear war doesn't prove who's Right, just who's Left" (ABC news 10/13/87)

eugene@eos.UUCP (Eugene Miya) (09/13/88)

In ACM SIGPLAN Notices, vol. 17, no. 9, Sept. 1982, Alan Perlis
wrote in the article Epigrams on Programming:

> You can measure a programmer's perspective by noting his attitude on the
> continuing vitality of FORTRAN.

I say:
You can measure a person's perspective by noting whether he thinks
a VAX is a "mainframe."

You are welcome to make other bumper sticker computer science (as
Bentley calls it).

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene