[comp.arch] Do RISC Compilers Consider Multiprogramming?

davis@mcnc.org (Mark C. Davis) (05/03/88)

(This is a repost.  If you have seen this before, I appologize, but my
first posting did not make it to some nearby sites).

This is an extension of the discussion of using RISC processors for real
time jobs.  Real time is actually large order multiprogramming
with deadlines (as averse to large order multiprogramming without
deadlines - i.e. timesharing).

A central assumption for RISC is that modern optimizing compilers can
produce efficient code for such machines.  These compilers further
assume that they can predict the state of the machine, eg. register 1
contains a certain variable or keeping inner loop code close together
will make it more likely for a cache hit.

These assumptions seem quite reasonable for uniprocessors and low order
multiprogramming (which I will arbitrarily define as < 20 context
switches / sec which on my Sun 3/60 corresponds to a load average
(processes waiting to run) of < 0.50)).

However, on a high order multiprogramming machine ( > 100 context
switches / sec) assumptions about machine state may no longer be
correct.  Data may need to be reloaded into registers before continuing
(admittedly an operating system task, but one that must be done if data
is kept in registers) and code may no longer be in the cache.

My questions for comp.arch are these:

        1. Do any RISC compilers consider the effects of frequent
           context switch on their optimizations?

        2. Does large order multiprogramming invalidate a primary
           assumption of RISC so that other architectures are better for
           this application?

Disclaimer: I believe that RISC is the best solution to low order
multiprogramming (like my workstation).  But, there does seem to be this
application of uniprocessor high order multiprogramming that will not go
away (people wanting to put 40 users on a sun 4 and process control
with zillions of independent processes).

Thanks for you thoughts - Mark (davis@cs.unc.edu or decvax!mcnc!davis)

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)

[ discussion of generating a good compiler which preserves state in a
  multi processor environment ]
   
  Interesting thought from the past... the GE 600 series allowed
interrupts on an instruction fetch from an even location (since the bus
was 72 bits wide that was every other instruction). This resulted in the
ability to preserve context throught a few instructions.

  What are the ramification of doing the same thing on a newer
processor? If you knew that you could keep all the balls in the air for
two (or 2^N) instructions could you write a better compiler? Would a half
instruction increase in interrupt latency affect thruput adversely?
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

johnl@ima.ISC.COM (John R. Levine) (05/05/88)

In article <10707@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>  Interesting thought from the past... the GE 600 series allowed
>interrupts on an instruction fetch from an even location (since the bus
>was 72 bits wide that was every other instruction). This resulted in the
>ability to preserve context throught a few instructions.
>
>  What are the ramification of doing the same thing on a newer
>processor? ...

Actually, each 635 instruction had a LOCK bit that let you write
interrupt-proof sections as long as you wanted, subject to a lockup timer that
would blow you away after a few milliseconds. User mode (a/k/a slave mode,
ugh) programs could do this as well as the system. But I don't think that
either the LOCK bit or the instruction pairing locked out memory accesses by
any of the other processors, making the lock bit of only moderate use in
most systems which contained multiple CPUs.

Modern processors don't have a LOCK bit because they have better uses for
their instruction bits. These days you can get the same effect by preceding
your locked section by an "interrupts off" instruction and following it with
an "interrupts on" instruction. This is adequate to implement critical
sections on uniprocessors but not on multiprocessors. On multiprocessors it
seems to be useful to write code sort of like this:

	interrupts off
	set global lock in memory	; typical special instruction
	... manipulate locked structure ...
	release global lock
	interrupts on

You turn off the interrupts to make sure that the global lock is not held an
unduly long time because an interrupt occurs in the middle. The manpulation is
usually only a handful of instructions, so it's quicker to wait for someone
else to release the lock than to do any sort of queueing to wait for it.

(It's more complicated, if you can't get the lock right away you have to
turn interrupts back on while you wait, but I'm sure you get the idea.)
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something
Rome fell, Babylon fell, Scarsdale will have its turn.  -G. B. Shaw

webber@constance.rutgers.edu (Bob Webber) (05/06/88)

In article <620@speedy.mcnc.org>, davis@mcnc.org (Mark C. Davis) writes:
> ...
> Disclaimer: I believe that RISC is the best solution to low order
> multiprogramming (like my workstation).  But, there does seem to be this
> application of uniprocessor high order multiprogramming that will not go
> away (people wanting to put 40 users on a sun 4 and process control
> with zillions of independent processes).

The Mips R2000 (and presumably the R3000) has a large register set that
can be organized as stack caches (perhaps even overlapping register
sets?) or as separate process contexts.  Clearly the designers wanted
to make both options available to the software people.  Whether actual
systems based on them really do much with the feature in terms of
making it available to the system users, however, I don't know.

-------- BOB (webber@aramis.rutgers.edu ; rutgers!aramis.rutgers.edu!webber)

p.s., in case it wasn't clear from context, the R2000/3000 are generally
viewed as RISC chips although doubtless somewhere there is someone whose
definition of RISC they don't fit -- the infinite variety of the universe
is astounding, but why so much of that variety was concentrated in humanity
is outright puzzling.

ian@esl.UUCP (Ian Kaplan) (05/07/88)

In article <620@speedy.mcnc.org> davis@mcnc.org (Mark C. Davis) writes:
>
[text deleted]
>A central assumption for RISC is that modern optimizing compilers can
>produce efficient code for such machines.  These compilers further
>assume that they can predict the state of the machine, eg. register 1
>contains a certain variable or keeping inner loop code close together
>will make it more likely for a cache hit.
[more text deleted]
>However, on a high order multiprogramming machine ( > 100 context
>switches / sec) assumptions about machine state may no longer be
>correct.  Data may need to be reloaded into registers before continuing
>(admittedly an operating system task, but one that must be done if data
>is kept in registers) and code may no longer be in the cache.
>

  To properly attack this question I think that we must first
carefully define what is meant by RISC.  Three features come to mind:

     - Reduced instruction set (e.g., most instructions executed in 1 
       cycle)
     - Relatively large numbers of registers (e.g., register windows)
     - Optimizing compilers that perform code reordering

A side issue that has become increasingly important for RISC machines
because of their high bus/memory usage is that of cache.

Reduced Instruction Set

  The reduced instruction feature is a non-issue for "hard" real time
systems.  If anything, implementing instructions in logic rather than
microcode so that most instructions are executed in one cycle is good
for real time systems.  

Cache as a RISC Issue

  In hard real time systems caches can be a problem whether the CPU is
RISC or CISC.  As others on the net have commented, this results from the
non-deterministic execution time introduced by the chance of cache
miss.

Registers

  The effect of a large number of registers in a processor used in a
hard real time environment is an interesting question.  Aside from
issues of silicon real estate, this question can be divorced from the
RISC question.  As Prof. Jensen at CMU has pointed out, you can build
CISC processors with large register files too.  The compilers for a
processor with a large register file may not only attempt to optimize
register usage, but also cache usage.  Mark's point is that in a hard
real time environment, the assumptions the compiler makes in
optimizing for register and cache usage (e.g., loop optimization for
example) may be incorrect, since the program is constantly being
interrupted by higher priority processes.  In this case, a large
number of "live" registers could actually slow the system down, since
the registers might need to be stored on context switch.

  RISC processors can provide separate sets of registers for user
processes and the operating system.  This avoids saving registers on
context switch to the OS.  RISC processors can also provide separate
"register banks" for the different processes.  This is an old
technique that goes back to the Xerox Dorado.  When a context switch
takes place, the register bank is simply switched.  Of course the
context for the process may not be loaded into registers, but if a
sufficient number of register banks are provided, most commonly used
processes will tend to be cached in registers.

Cache

  A similar technique can be used for the cache.  Many systems have a
separate cache partitions for the operating system, so system calls do
not cause cache flushes.  If the cache is big enough and uses physical
address cache tags, the cache does not have to be flushed on context
switch and the data needed by commonly executed processes will tend to
be in the cache.

john@anasaz.UUCP (John Moore) (05/07/88)

In article <1002@ima.ISC.COM% johnl@ima.UUCP (John R. Levine) writes:
%%  Interesting thought from the past... the GE 600 series allowed
%%interrupts on an instruction fetch from an even location (since the bus
%%was 72 bits wide that was every other instruction). This resulted in the
%%ability to preserve context throught a few instructions.
This, by the way, let to a rather bizarre bug in an old GECOS-3 module.
Seems that a module had been written where the programmer saved all
registers with one instruction (it's been too long... don't remember the op).
Much later the code wanted to restore all registers. However, on a
multiprocessing 635 the processor number had to be in register 7 when
in master mode. So... the clever programmer preceded the multiple load
instruction with one which stored register seven into the register save
area. ... I later added code to the module, causing the register store to slide
from an even boundary (where it was safe due to the effect mentioned above)
to an odd boundary. Naturally, at some point the module was switched from
on processor to another BETWEEN THOSE TWO INSTRUCTIONS. The operating system
kernel crashed when it discovered that it had two processors numbered 1 and
no processor zero. So.... use something other than even/odd addresses if
you don't want to drive folks crazy! (and, if you are a programmer, don't
count on such a sneaky trick!)
-- 
John Moore (NJ7E)   hao!noao!mcdsun!nud!anasaz!john
(602) 870-3330 (day or evening)
The opinions expressed here are obviously not mine, so they must be
someone else's.

mills-c@cypress.cis.ohio-state.edu (Christopher Mills) (05/09/88)

#undef LINEEATER

In article <10707@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>   
>  Interesting thought from the past... the GE 600 series allowed
>interrupts on an instruction fetch from an even location (since the bus
>was 72 bits wide that was every other instruction). This resulted in the
>ability to preserve context throught a few instructions.
>
>  What are the ramification of doing the same thing on a newer
>processor?

>	bill davidsen		(wedu@ge-crd.arpa)


	Hmm.  Doesn't the Transputer already do this?  Context switches (for
low priority tasks) are permitted only after unconditional jumps, loops, and
instructions that can cause the task to block (I/O).  The state of the
evaluation registers are undefined after these instructions, so the context
switch consists of only the PC and the Workspace Pointer.  The compiler doesn't
usually care about the evaluation registers during a branch, and there is
always a chance for a context switch each time through a loop.  I think it's a
good solution, concidering the goals of the Transputer, but then again, what
do I know?


				Chris Mills
				mills@baloo.eng.ohio-state.edu

#include <stddisclamer.h>

tim@amdcad.AMD.COM (Tim Olson) (05/17/88)

In article <May.6.09.24.49.1988.13843@constance.rutgers.edu> webber@constance.rutgers.edu (Bob Webber) writes:
| The Mips R2000 (and presumably the R3000) has a large register set that
| can be organized as stack caches (perhaps even overlapping register
| sets?) or as separate process contexts.  Clearly the designers wanted
| to make both options available to the software people.  Whether actual
| systems based on them really do much with the feature in terms of
| making it available to the system users, however, I don't know.

Uhhh.. I think you may have your processors confused, here.  The Am29000
is the processor that has the large register file which can be used as a
stack cache or as multiple banks (purely a software convention).

All of our current software generates code for the stack-cache model, as
that is the best in terms of performance, and matches most applications
better (most real-time applications have more than the 8 tasks allowable
simultaneously in the register file, so context-switch time, while
on average very small, may sometimes be much larger.)

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

mash@mips.COM (John Mashey) (05/18/88)

In article <21653@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes:
>In article <May.6.09.24.49.1988.13843@constance.rutgers.edu> webber@constance.rutgers.edu (Bob Webber) writes:
>| The Mips R2000 (and presumably the R3000) has a large register set that...

>Uhhh.. I think you may have your processors confused, here.  The Am29000
>is the processor that has the large register file which can be used as a...

Yes, for sure.  The R[23]000 has 32 integer registers (although you don;t
need to save/restore the one that's always zero), and the equivalent of
32 32-bit FP ones.

The comment missed the target by a couple blocks in Sunnyvale :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086