[comp.arch] RISC & context switches

hascall@atanasoff.cs.iastate.edu (John Hascall) (02/10/89)

  One of the benifits of a simple instruction set (RISC) is that it
  frees up chip area for more registers.  I think some papers have
  proposed register counts > 100, what is the largest number of
  general purpose registers in an existing chip?

  What I am curious about is, what (if any) special techniques can
  be employed to prevent a large performance hit at context switch
  time (i.e., saving all those registers for the current process,
  and restoring them for the new process)?

  Do they just rely on the PCBs being in the (data) cache?

  What about a special cache for PCBs?  Is it worth it?  Is it
  workable?

  I seem to recall there was (is?) a TI processor which had all of
  its registers in memory except 1 register which pointed to
  the other registers, so a context switch was just save/restore
  that one register.  Could a similar concept be implemented
  with all the registers in the chip?

  Consider a machine with say 32 GP registers, suppose further that
  the processor was built with say 544 (32 + (32*16)) GP registers and
  a special PID (process index) register.  Process slots 0-15 are
  reserved for "real-time" processes (when a new process is
  created, it will not use one of those slots unless it requests
  it).  Now, at context switch time if the "outgoing" process has
  an index of 0-15 no save is needed, and if the "incoming" process
  has an index also in the range of 0-15 no restore is needed either.
  For a process whose index is 16+ the 17th register set is used,
  and is saved/restored as in a "normal" system.

  It seems to me that such a scheme would take little extra hardware
  (other than the extra registers).  I just pulled the number 16 out
  of the air, any power of 2 would be as easily implemented--perhaps
  enough that on a workstation most or all of the processes could
  have a "real-time" slot.


                 PID register           program specified register number
      +-+-+-+-+-+-+-+   +-+-+-+-+-+     +-+-+-+-+-+
      | | | | | | |  ...  | | | | |     | | | | | |
      +-+-+-+-+-+-+-+   +-+-+-+-+-+     +-+-+-+-+-+
       | | | | | | | ... | | | | |       | | | | |
       \ or together to  / \ concatenate to form /
        \ form "use#17" /   \ actual register # /
	 \ signal      /     \ (if ~use#17)    /


A hair-brained scheme or what?

  John Hascall
  ISU Comp Center

robertb@june.cs.washington.edu (Robert Bedichek) (02/11/89)

In article <784@atanasoff.cs.iastate.edu> 
             hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
> <some lines deleted>
>
>  I seem to recall there was (is?) a TI processor which had all of
>  its registers in memory except 1 register which pointed to
>  the other registers, so a context switch was just save/restore
>  that one register.  Could a similar concept be implemented
>  with all the registers in the chip?

I believe that you are thinking of the TI9900, one of the first 
16-bit microprocessors.  It was very slow, I think at least partly
because it kept its registers in memory.

>
> <proposal to have register banks which switch at context switch>
>
>  It seems to me that such a scheme would take little extra hardware
>  (other than the extra registers).  I just pulled the number 16 out
>  of the air, any power of 2 would be as easily implemented--perhaps
>  enough that on a workstation most or all of the processes could
>  have a "real-time" slot.

Yes, but the extra hardware for the registers takes a lot of silicon
area!  Some Xerox machines had such a scheme.  They had something like
8 register banks and could do a context switch in a few cycles.  A
large semiconductor company copied this idea in an IO processor that I
think will never see the light of day.

>A hair-brained scheme or what?

Well, I don't think its hair-brained, it makes sense if you want very
fast context switch time.  But having lots of register banks is very
expensive in silicon area and in register access time.  Register files
tend to be multiported, so each bit takes a lot of area.  Increasing
the size of the register file will often lead to an increased cycle
time or an increase in the number of cycles to do a basic operation.

To pick an example:

The Motorola 88000 has 31 general registers.  If you added your
register bank idea to this machine you would get very little benefit
when running UNIX.  There are so many other things that have to be done
on a context switch that the time to save the 31 general registers is
insignificant.

Btw, I think the designers of the 88k made *excellent* trade-offs in
its design.  After spending a year working on system software for the
machine, there is almost nothing that I would change.

	Rob Bedichek

colwell@mfci.UUCP (Robert Colwell) (02/12/89)

In article <784@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>
>  One of the benifits of a simple instruction set (RISC) is that it
>  frees up chip area for more registers.  I think some papers have
>  proposed register counts > 100, what is the largest number of
>  general purpose registers in an existing chip?

You're apparently talking about single-chip micros.  That's not the
only domain in which RISC/CISC concepts are interesting, and I think
once you leave the single-chip domain, your premise isn't obviously
correct.

>  I seem to recall there was (is?) a TI processor which had all of
>  its registers in memory except 1 register which pointed to
>  the other registers, so a context switch was just save/restore
>  that one register.  Could a similar concept be implemented
>  with all the registers in the chip?

I think this was the TI 9900, the first 16-bit micro, which for some reason
didn't seem to catch on very well.  It did indeed have all its registers in
main memory.  And this isn't as dumb an idea as it first appears -- you need
far fewer address bits to refer to a register than to memory addresses, so
having "registers" that reside in memory is still better than no "registers"
at all.  The BellMac-8 microprocessor from Bell Labs, ca 1977-1979, borrowed
this idea.  I'm not sure of the TI chip, but Bell's also had the overlapped
sliding register window for parameter-passing that later showed up again in
the RISC-I from Berkeley.

The Bellmac-8 also had one of the nicest assemblers I've seen -- had lots of
high level constructs like if-then-else, while, do-until, switch, etc.  If
you really wanted one-for-one mapping of code to machine you didn't have to
use those features, but it was often very nice to have them.

Bob Colwell               ..!uunet!mfci!colwell
Multiflow Computer     or colwell@multiflow.com
175 N. Main St.
Branford, CT 06405     203-488-6090

kyriazis@rpics (George Kyriazis) (02/12/89)

In article <7239@june.cs.washington.edu> robertb@uw-june.UUCP (Robert Bedichek) writes:
>In article <784@atanasoff.cs.iastate.edu> 
>             hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>> <some lines deleted>
>>
>>  I seem to recall there was (is?) a TI processor which had all of
>>  its registers in memory except 1 register which pointed to
>>  the other registers, so a context switch was just save/restore
>>  that one register...
>
>I believe that you are thinking of the TI9900, one of the first 
>16-bit microprocessors.  It was very slow, I think at least partly
>because it kept its registers in memory.
>

No, it wasn't snow because of that.  It wasn't optimised at all.
It had 4 non-overlapping clocks, and the internal algorithms were
terribly slow.  If you are thinking of the TI99/4A, yes it was much
slower simply because it was expanding each bus cycle into 6 (!!).
An 8/16 bit succesor of the 9900 the 9995, was faster than the 8088,
and the 99000 (built to fight the 68000), was benchmarking better that
the 68000 (at least that's what they claim).  

I really liked that architecture, but I guess that it wasn't enough :-)
Oh well..

  George Kyriazis
  kyriazis@turing.cs.rpi.edu
  kyriazis@rdrc.rpi.edu
------------------------------

henry@utzoo.uucp (Henry Spencer) (02/12/89)

In article <784@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>  I seem to recall there was (is?) a TI processor which had all of
>  its registers in memory except 1 register which pointed to
>  the other registers, so a context switch was just save/restore
>  that one register.  Could a similar concept be implemented
>  with all the registers in the chip?

You can use the AMD 29000 that way, in fact, although doing register
windows is more popular in Unix environments.  If you dedicate a set of
16 registers to each process, and dedicate most of the global registers
saving the rest of the state for the processes, you can have 8 processes
running with a context-switch time of something like 17 cycles.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bradb@ai.toronto.edu (Brad Brown) (02/12/89)

In article <7239@june.cs.washington.edu> robertb@uw-june.UUCP (Robert Bedichek) writes:
>In article <784@atanasoff.cs.iastate.edu> 
>             hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>> <some lines deleted>
>>
>>  I seem to recall there was (is?) a TI processor which had all of
>>  its registers in memory except 1 register which pointed to
>>  the other registers, so a context switch was just save/restore
>>  that one register.  Could a similar concept be implemented
>>  with all the registers in the chip?

This is not a bad idea if you have the silicon to do it (as other posters
have pointed out.)  Actually it's been used in some designs.  The most
interesting is actually the IBM 8100, a fast transaction processing machine
which is kind of old and has now been discontinued.  

The 8100 had a total of 1024 registers, divided up into banks of 32 registers.
That means 32 processes could each have their own context and you could
switch between processors REALLY fast.

There is a somewhat related problem when you make a subroutine call --
the calling function usually has to save its registers so it gets it's
"context" restored when the function returns.  Machines like MIPS have
made use of their very large number of registers (192?) by having a pointer
to one of the registers that is effectively the base pointer for the
stack of registers that the currently executing function can use.  When
you want to make a function call you just advance the pointer past the
registers that you are using, zap arguments into the registers just
after the pointer, and branch to the function.  (Of course it's more
complicated than that, but you can see where the time savings comes from...)

					(-:  Brad Brown  :-)
					bradb@ai.toronto.edu

moore%cdr.utah.edu@wasatch.UUCP (Tim Moore) (02/13/89)

In article <89Feb12.125852est.10867@ephemeral.ai.toronto.edu> bradb@ai.toronto.edu (Brad Brown) writes:

)There is a somewhat related problem when you make a subroutine call --
)the calling function usually has to save its registers so it gets it's
)"context" restored when the function returns.  Machines like MIPS have
)made use of their very large number of registers (192?) by having a pointer
)to one of the registers that is effectively the base pointer for the
)stack of registers that the currently executing function can use.

You're confusing MIPS and SPARC here. The MIPS chips have a fairly
conventional set of general registers; SPARC has a large file of
registers that are divided into "windows" in the manner you describe.


			-Tim Moore
	4560 M.E.B.		   internet:moore@cs.utah.edu
	University of Utah	   ABUSENET:{ut-sally,hplabs}!utah-cs!moore
	Salt Lake City, UT 84112

hascall@atanasoff.cs.iastate.edu (John Hascall) (02/13/89)

In article <89Feb12.125852est.10867@ephemeral.ai.toronto.edu> bradb@ai.toronto.edu (Brad Brown) writes:
>In article <7239@june.cs.washington.edu> robertb@uw-june.UUCP (Robert Bedichek) writes:
>>In article <784@atanasoff.cs.iastate.edu> 
>>             hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>>> <some lines deleted>

>There is a somewhat related problem when you make a subroutine call --
>the calling function usually has to save its registers so it gets it's
>"context" restored when the function returns.  Machines like MIPS have
>made use of their very large number of registers (192?) by having a pointer
>to one of the registers that is effectively the base pointer for the
>stack of registers that the currently executing function can use. ....

   This was part of my question... I take it, at context switch the MIPS
   processor has to save and restore all those registers (at least as
   far "up" as the "topmost" register in use--potentially all of them).
   Doesn't that mean roughly 400 memory accesses (assuming 192 is correct),
   all at once--just the sort of thing RISC is supposed to avoid?

   What effect (if any) does this have on the suitability of these processors
   for "real-time" systems?

   John Hascall
   ISU Comp Center

mikes@oakhill.UUCP (Mike Schultz) (02/13/89)

In article <640@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>In article <784@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>
>>  I seem to recall there was (is?) a TI processor which had all of
>>  its registers in memory except 1 register which pointed to
>>  the other registers, so a context switch was just save/restore
>>  that one register.  Could a similar concept be implemented
>>  with all the registers in the chip?
>
>I think this was the TI 9900, the first 16-bit micro, which for some reason
>didn't seem to catch on very well.  

Probably because they couldn't figure out that if you gave hardware away to 
universities, then you grow people who knew TI when they graduated and took
that to the market place.  They also tended to be very business and industrial
oriented.  IMHO.

>It did indeed have all its registers in
>main memory.  And this isn't as dumb an idea as it first appears -- you need
>far fewer address bits to refer to a register than to memory addresses, so
>having "registers" that reside in memory is still better than no "registers"
>at all.  

Also consider that the 9900 was simply a single chip version of the TI 990 
mini computer.  I'm not sure of all my facts here, but when it was introduced,
the 990's CPU speed was not all that far from the memory speed, thus the
penality wasn't that much.  Later, as memory became slower compared to the CPU,
they cached the current register set into fast static RAM on the CPU board
and flushed them to memory as needed.  (I'm told that it made for some 
interesting hardware considering that programs could, and did, go to the 
memory address of a register to fiddle with the low order byte of the 
register.)

Mike Schultz
mikes@oakhill.UUCP

bradb@ai.toronto.edu (Brad Brown) (02/14/89)

In article <792@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>In article <> bradb@ai.toronto.edu (Brad Brown) writes:
>>There is a somewhat related problem when you make a subroutine call --
>>the calling function usually has to save its registers so it gets it's
>>"context" restored when the function returns.  Machines like MIPS have
>>made use of their very large number of registers (192?) by having a pointer
>>to one of the registers that is effectively the base pointer for the
>>stack of registers that the currently executing function can use. ....
>
>   This was part of my question... I take it, at context switch the MIPS
>   processor has to save and restore all those registers (at least as
>   far "up" as the "topmost" register in use--potentially all of them).
>   Doesn't that mean roughly 400 memory accesses (assuming 192 is correct),
>
>
>   What effect (if any) does this have on the suitability of these processors
>   for "real-time" systems?

[As some people have pointed out, I got mixed up between MIPS and SPARC --
my comments above should apply to SPARC...]

I think the idea is that in most systems there are a *lot* more function
calls than full context switches, which are quite different from the point
of view of the amount of work that has to be done.  If you can save some
time on the function calls then you can afford to waste a little more
on the time to save the registers for a full context switch.  

I don't know whether this would be a big performance hit for real-time
systems.  Perhaps there are ways of knowing how many registers are
actually in use and saving them in a burst.  Perhaps there are ways
of handling some kinds of real-time events by just allocating a new
register window.  Perhaps this would work form some "lightweight" inter-
rupts, though it's obviously unsuitable for a full context switch.

					(-:  Brad Brown  :-)
					bradb@ai.toronto.edu

tim@crackle.amd.com (Tim Olson) (02/14/89)

In article <1101@wasatch.UUCP> moore%cdr.utah.edu.UUCP@wasatch.UUCP (Tim Moore) writes:
| In article <89Feb12.125852est.10867@ephemeral.ai.toronto.edu> bradb@ai.toronto.edu (Brad Brown) writes:
| 
| )There is a somewhat related problem when you make a subroutine call --
| )the calling function usually has to save its registers so it gets it's
| )"context" restored when the function returns.  Machines like MIPS have
| )made use of their very large number of registers (192?) by having a pointer
| )to one of the registers that is effectively the base pointer for the
| )stack of registers that the currently executing function can use.
| 
| You're confusing MIPS and SPARC here. The MIPS chips have a fairly
| conventional set of general registers; SPARC has a large file of
| registers that are divided into "windows" in the manner you describe.

I think he was talking about the Am29000 (192 registers).  The 29k has
64 globals and 128 locals, all of which are accessible by the
instructions.  An internal stack pointer allows a register-window
implementation that uses variable-sized windows (tailored to the size of
each individual function's needs), rather than the fixed-sized windows
of the SPARC.


	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)

schmitz@fas.ri.cmu.edu (Donald Schmitz) (02/14/89)

In article Robert Bedichek writes:
>In article John Hascall writes:
>> <proposal to have register banks which switch at context switch>
>>
>>  It seems to me that such a scheme would take little extra hardware
>>  (other than the extra registers).  I just pulled the number 16 out
>>  of the air, any power of 2 would be as easily implemented--perhaps
>>  enough that on a workstation most or all of the processes could
>>  have a "real-time" slot.
>
>Yes, but the extra hardware for the registers takes a lot of silicon
>area!  Some Xerox machines had such a scheme.  They had something like
>8 register banks and could do a context switch in a few cycles.  A
>large semiconductor company copied this idea in an IO processor that I
>think will never see the light of day.
>
>>A hair-brained scheme or what?

A similar thread went around a year ago, and I came up with the idea of CPUs
with externally addressable register/state files, plus a "scheduling CPU".
The "scheduler" would make the CPUs context switch by exterally halting
them, dumping/updating their register file via a DMA or block xfer operation
to fast memory used as a PCB cache (via the hardware interface to the
register file), and then restarting them.

The real win is not so much the reduced context switch time, but the ability
to run the scheduling process on a dedicated CPU in parallel with the "real"
processes.  The extra cycles available for scheduling can (hopefully) be
used for more sophisticated scheduling algorithms.  This would be a real win
in a multi CPU system, as "real" processes could be scheduled to avoid
conflicts for system resources, such as main memory bandwidth and disk
accesses.  The hardware cost of this is an extra data/address path to the
register file, plus some additional multiplexing of the chip pins - not
insignificant in a really high perf CPU but much less costly than multiple
register files.  If you don't want to build mutant chips, you can do a
similar thing with conventional processors, shared memory, interrupts and
software, without quite the savings in the raw context switch time (but
still a win in scheduling time and hopefully a big win in overall
utilization).

Anyway, I got 2 or 3 responses from places working on such systems, although
I still haven't seen one released.

Don Schmitz	(schmitz@fas.ri.cmu.edu)
--

robertb@june.cs.washington.edu (Robert Bedichek) (02/15/89)

In article <4274@pt.cs.cmu.edu> schmitz@fas.ri.cmu.edu (Donald Schmitz) writes:
>
>A similar thread went around a year ago, and I came up with the idea of CPUs
>with externally addressable register/state files, plus a "scheduling CPU".
>The "scheduler" would make the CPUs context switch by exterally halting
>them, dumping/updating their register file via a DMA or block xfer operation
>to fast memory used as a PCB cache (via the hardware interface to the
>register file), and then restarting them.
>
>The real win is not so much the reduced context switch time, but the ability
>to run the scheduling process on a dedicated CPU in parallel with the "real"
>processes.  The extra cycles available for scheduling can (hopefully) be
>used for more sophisticated scheduling algorithms.  This would be a real win
>in a multi CPU system, as "real" processes could be scheduled to avoid
>conflicts for system resources, such as main memory bandwidth and disk
>accesses.  The hardware cost of this is an extra data/address path to the
>register file, plus some additional multiplexing of the chip pins - not
>insignificant in a really high perf CPU but much less costly than multiple
>register files.  

If the processor is halted while the dumping of registers is going on
then you don't need any extra data paths to the registers.  The CDC
6600 did what you describe, its PP (Peripheral Processors) made the
processor do an "exchange jump", where the registers were swapped with
an image in memory.  I don't know where the scheduling algorithm was
done though.  The 6600 was considerably easier to program than the
PP's, so I suspect that it was done on the 6600.  (The relative
difficultly of programming is generally a problem with dedicated
special purpose attached processors, such as IO processors.  It can be
done, of course, but faced with the decision of where to implement some
new feature, system programmers tend to put it on the main CPU.)

And if the processor is going to be waiting while its registers are
dumped, why not just have the processor do the dumping ... and now
the scheme has degenerated to the software solution.

I don't see any advantage to your scheme in current general purpose
systems.  If you want to run the scheduling algorithm in parallel, then
why not just run it on another "real processor"?  Why statically
allocate a machine to an activity unless it is a big win in doing so?

>If you don't want to build mutant chips, you can do a
>similar thing with conventional processors, shared memory, interrupts and
>software, without quite the savings in the raw context switch time (but
>still a win in scheduling time and hopefully a big win in overall
>utilization).

Right, but what's the difference between this (degenerating to having
everything done in software) and what is done "conventionally" on
shared memory multiprocessors (e.g., Sequent)?

>
>Anyway, I got 2 or 3 responses from places working on such systems, although
>I still haven't seen one released.
>
>Don Schmitz	(schmitz@fas.ri.cmu.edu)
>-- 

	Rob

    "Live to code
     Code to live"

beg, plea to all: run spell on your text before posting

petolino%joe@Sun.COM (Joe Petolino) (02/15/89)

>>  I seem to recall there was (is?) a TI processor which had all of
>>  its registers in memory except 1 register which pointed to
>>  the other registers, so a context switch was just save/restore
>>  that one register.  Could a similar concept be implemented
>>  with all the registers in the chip?

>You can use the AMD 29000 that way, in fact, although doing register
>windows is more popular in Unix environments.  If you dedicate a set of
>16 registers to each process, and dedicate most of the global registers
>saving the rest of the state for the processes, you can have 8 processes
>running with a context-switch time of something like 17 cycles.

This same trick could be used with SPARC, too, for example if you were
writing a real-time OS that needed fast, predictable context switch timing.
The 'Current Window Pointer' (CWP) is a field of the PSR - writing a new
value into the PSR gives you a whole new set of window registers, preserving
the old register values.

For those not familiar, here's a quick overview of the way SPARC registers
work:  there are eight global registers (one of them, g0, is hard-wired as
a constant 0), plus a circular file of windowed registers.  The size of this
register file is implementation-dependent (it's 112 registers on the Sun4
chip).  At any one time, the processor has access to a 'window' of 24 of
these registers, starting at the one pointed to by the CWP field of the PSR
(the CWP always points to a register whose number is a multiple of 16).  The
CWP can change (- or +, mod the size of the register file) in increments of 16
registers, in response to two instructions (save and restore) which are
normally used in conjunction with the instructions that do procedure calls
and returns.  Thus, the 24-register window of a called routine overlaps its
caller's 24-register window by eight registers.  When a trap occurs, the CWP
automatically moves up by sixteen registers.  You can think of it as a
poor-man's stack cache - the poverty part is that the stack pointer can only
move in increments of 16, the CPU can only look at the top 24 words of the
stack, and it has a finite size that must be managed by the OS.  That
management is facilitated by the 'Window Invalid Mask' (WIM), a special
register with a bit for each possible value of the CWP.  If a save or restore
instruction would cause the CWP to decrement or increment to a value whose
corresponding WIM bit is 1, then that instruction traps, and the OS must free
up some registers (and update the WIM) before continuing.

Note that, in an application where fast context switches between a small
number of processes was the most important factor, you wouldn't even use the
WIM.  You'd write all the code without save and restore instructions (note
that these operations are *not* part of the call/return instructions), and
instead use normal loads and stores to save the state of the registers across
procedure calls.  The OS could then allocate 32 consecutive registers (i.e.
two adjacent CWP values) for each process: one 24-register window to run in,
and another 8 (in the next window above) for trap handlers to use.

-Joe Petolino

"I don't work for Marketing.  Nobody told me to write this.  As far as I
know, it's all true!."

andrew@eve.oz (Andrew McRae) (02/15/89)

From article <89Feb12.125852est.10867@ephemeral.ai.toronto.edu>, by bradb@ai.toronto.edu (Brad Brown):
>> [ Discussion about multiple register sets ]
> 
>This is not a bad idea if you have the silicon to do it (as other posters
>have pointed out.)  Actually it's been used in some designs.  The most
>interesting is actually the IBM 8100, a fast transaction processing machine
>which is kind of old and has now been discontinued.

The Concurrent Computer Corp. 3200 series has multiple register
sets (up to 16), the operative set selected by a 4 bit field in the
processor status register. Generally these are not used to speed
context switching between processes, but to allocate one set to the
user level processes, and the other sets to the OS. Each interrupt
level had a different register set, so that no register saving had
to occur at interrupt service time, and there was no need to save
user registers during kernel operations.

This register swapping was
tied up with the architecture (e.g. at interrupt time some of
the registers had useful values stored in them by the microcode
such as vector number, device address, previous status/program
counter etc). Not being a stack based machine, the register
swapping tended to be fundamental to the way the OS did things (I'm
speaking of course for the native OS/32, not Unix).

I'm suprised that this idea (different register sets for different
interrupt levels) has not been taken up in some of the more modern
architectures, but there are problems (try doing a splX, and see
if your registers hold the same values...). 

Andrew McRae			inet:	andrew@megadata.oz{.au}
Megadata Pty Ltd,		uucp:	..!uunet!munnari!megadata.oz!andrew
NSW    AUSTRALIA

D
D

des@inmos.co.uk (David Shepherd) (02/18/89)

In article <640@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>In article <784@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>>  I seem to recall there was (is?) a TI processor which had all of
>>  its registers in memory except 1 register which pointed to
>>  the other registers, so a context switch was just save/restore
>>  that one register.  Could a similar concept be implemented
>>  with all the registers in the chip?
>
>I think this was the TI 9900, the first 16-bit micro,

The INMOS transputer has a similar idea. It has a 3 deep register stack, an
instruction pointer and a workspace pointer that points into memory and 
(currently) 4k of on chip RAM. Loading and storing to on chip RAM relative to
the workspace gives you 16 fast (1 cycle store, 2 cycle load) "registers" and
256 slightly slower (2 cycle store, 3 cycle load) "registers". Switching from
one concurrent process to another only involves storing the instruction pointer,
workspace pointer, adding the descheduled process to the end of the scheduling
queue and taking the new one off the front.

>							which for some reason
>didn't seem to catch on very well. 

hmmm ... perhaps it didn't have a decent C compiler either ;-)

david shepherd
INMOS ltd

disclaimer: any opinions expressed above are mine -- so don't steal them

khb%chiba@Sun.COM (chiba) (02/18/89)

In article <103@eve.oz> andrew@eve.oz (Andrew McRae) writes:
>From article <89Feb12.125852est.10867@ephemeral.ai.toronto.edu>, by bradb@ai.toronto.edu (Brad Brown):
>>> [ Discussion about multiple register sets ]
>> 
.....
>The Concurrent Computer Corp. 3200 series has multiple register
>sets (up to 16), the operative set selected by a 4 bit field in the
>processor status register. Generally these are not used to speed
...
>tied up with the architecture (e.g. at interrupt time some of
>the registers had useful values stored in them by the microcode
>such as vector number, device address, previous status/program
>counter etc). Not being a stack based machine, the register
>swapping tended to be fundamental to the way the OS did things (I'm
>speaking of course for the native OS/32, not Unix).
>
>I'm suprised that this idea (different register sets for different
>interrupt levels) has not been taken up in some of the more modern
>architectures, but there are problems (try doing a splX, and see
>if your registers hold the same values...). 

The IBM Series/1 also had this sort of set up (4 sets, as best I can
recall). 

More modern architectures haven't done this because most folks aren't
trying to optimize that kind of context switch (at least none of the
popular benchmarks are testing for it). It _is_ handy for certain
types of real time tasking...so perhaps folks deeply involved in that
area can comment.

Wearing my application programmer hat, I'd rather have all those
registers for _my_ task, if they can be usefully employed. 
Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus