[comp.arch] Do RISC Compilers Consider Multipro

aglew@urbsdc.Urbana.Gould.COM (05/05/88)

>Disclaimer: I believe that RISC is the best solution to low order
>multiprogramming (like my workstation).  But, there does seem to be this
>application of uniprocessor high order multiprogramming that will not go
>away (people wanting to put 40 users on a sun 4 and process control
>with zillions of independent processes).

Agree with your concern - many RISCs, especially those with register 
windows, have greatly increased context switch penalties.

May I point out, though, that increased machine state is less characteristic
of the MIPS line of RISC processors, typically with no more than 32 registers?

TLB adds no state if you are willing to do without virtual memory, which many
RT applications do (certainly, they don't want paging). However, memory mapping
is nice for improving reliability by preventing processes from stepping on each
other - since mapping is often the only memory protection mechanism provided.
    Hey, are there any segment oriented MMUs out there for RT applications
- ones that don't necessarily translate addresses, but that provide protection
based on process id, and do not require flushes on context switch? Seems to
make sense...

Also may I point out - one reason that many RT applications are structured as
zillions of processes is that the processors they were running on had small 
address spaces - like, 128K. Give them a larger address space, and many RT
applications can be restructured to take advantage of it.



Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
    aglew@gould.com     	- preferred, if you have MX records
    aglew@xenurus.gould.com     - if you don't
    ...!ihnp4!uiucuxc!ccvaxa!aglew  - paths may still be the only way
   
My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.

guy@gorodish.Sun.COM (Guy Harris) (05/06/88)

> Agree with your concern - many RISCs, especially those with register 
> windows, have greatly increased context switch penalties.

Note that while a particular based machine, running a particular version
of a particular operating system, may have a higher context switch time than
a different machine, this may not necessarily be due primarily to the first
machine having lots more registers.

For example, the current virtual-address-cache Sun machines spend lots of their
context-switch time flushing the old U page from the cache (I think this was
mentioned in one of the Usenix or EUUG papers on the new SunOS 4.0 virtual
memory system).  Even on Sun-4s, this time substantially dominates the
register-window-flush time.  Arranging to have the U area at different virtual
addresses in different processes will probably be a much bigger win than
getting rid of the register windows.

mash@mips.COM (John Mashey) (05/06/88)

In article <52183@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
....
>For example, the current virtual-address-cache Sun machines spend lots of their
>context-switch time flushing the old U page from the cache (I think this was
>mentioned in one of the Usenix or EUUG papers on the new SunOS 4.0 virtual
>memory system).  Even on Sun-4s, this time substantially dominates the
>register-window-flush time.  Arranging to have the U area at different virtual
>addresses in different processes will probably be a much bigger win than
>getting rid of the register windows.

There was a good article on this in the Summer 1987 Usenix (Phoenix),
although describing the virtual cache issue in the Sun3/2xx (same issue).
Guy is right on: you really have to add up ALL of the effects for
context-switching, many of which are hard to measure.
My list of effects, for context-switching amongst N processes:

1) Time to save state of previous process necessary to be saved.

2) Time to load in as much of the state of the new process as necessary.

3) Time to flush/setup memory management hardware.

4) Time to do any cache-flushing needed.

5) Time to execute the OS code paths needed.

Note that 1-4 are mostly architecture-related, while 5) is a function of
the OS choice (i.e., vanilla UNIXes may be rather different than real-time
OS's).  Items 1&2 are mostly instruction-set architecture dependent.

Note that one might add some implicit effects that may well cause
programs to run at different speeds in multiprogramming envrionment:

6) Linearity or lack thereof in 2), 3), and 4).  For example,
a VAX is linear on 3), because the TLB has entries for 1 user process,
and you flush it each time.  In systems with M contexts, you may notice
nonlinearities of overhead for 3)  when you start rapidly switching
amongst >M processes. (This happens with any such resource).

7) For cached systems, impact of multiple-processes and kernel
upon cache hit ratio of any given process.  This can depend a lot
of cache design, although it doesn't matter much for small caches.
As machines get faster, and caches bigger, the nature of the cache
design may become relevant, although relatively little is published
on this.  However, we're getting to the point where useful programs are
actually smaller than caches, and it is nice if programs don't
unnecessarily stomp on each other in the cache.  In a multi-process
environment with multiple instances of the same programs being used,
either the text segments should be shared in the cache,
or one should go to shared libraries.  (The latter is probably useful
for some virtual cache designs that make the former hard.)

As can be seen from this discussion, there is very little that has
anything to do with RISC versus CISC, especially in the long run.
Given a similar OS, the first-order effects are likely to come
from MMU & cache design effects.

It is interesting to note that uncached machines are likely to be
more linear (with increased processes) than cached ones, whether RISC or CISC,
i.e., an uncached machine can switch amongst processes with zero cache-related
penalties, whereas a cached one will usually take at least some hit.
Note: more linear does not necessarily equal faster! (thank goodness)

Thus, to answer the question that started all of this:
	a) Certainly some RISC machine designs carefully considered
	multiprogramming (HP Precision and MIPS R2000 are certainly
	examples, and I'm sure there are more.)
	b)  However, ti doesn't matter much: most of the first-order
	effects have nothing to do with RISC versus CISC.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

henry@utzoo.uucp (Henry Spencer) (05/08/88)

> Agree with your concern - many RISCs, especially those with register 
> windows, have greatly increased context switch penalties.

However, they may be able to get more done between context switches, by
virtue of the extra performance bought by that extra state.  This may well
end up reducing the average degree of multiprogramming and the number of
context switches.  Context-switch penalty BY ITSELF is significant only
if you're interested in latency to the near-total exclusion of throughput.
-- 
NASA is to spaceflight as            |  Henry Spencer @ U of Toronto Zoology
the Post Office is to mail.          | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

mash@mips.COM (John Mashey) (05/09/88)

In article <1988May8.022544.17676@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>> Agree with your concern - many RISCs, especially those with register 
>> windows, have greatly increased context switch penalties.

>However, they may be able to get more done between context switches, by
>virtue of the extra performance bought by that extra state.  This may well
>end up reducing the average degree of multiprogramming and the number of
>context switches.  Context-switch penalty BY ITSELF is significant only
>if you're interested in latency to the near-total exclusion of throughput.

As has been noted earlier, lots of other factors (UNIX kernel code in
general, cache & tlb manipulation), may well consume much more time
than loading/storing registers.  On a 25MHz R3000, saving or restoring
the (32ish) CPU regs takes about 2-3 microseconds, as does save/restore of
the FP regs (which only happens when needed).   In a UNIX environment,
that is just plain irrelevant, compared to the rest.

However, total context-switch penalty is hardly irrelevant in some
multi-user environments where you have bunches of people running
vi/emacs, etc or X clients, i.e., where there are lots of processes that
handle a keystroke or two, do a small amount of work, and then
block waiting for the next character.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

joe@rb-dc1.UUCP (Joe Hollinger) (05/10/88)

In article <28200140@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes:
>
>>Disclaimer: I believe that RISC is the best solution to low order
>>multiprogramming (like my workstation).  But, there does seem to be this
>>application of uniprocessor high order multiprogramming that will not go
>>away (people wanting to put 40 users on a sun 4 and process control
>>with zillions of independent processes).
>
>Agree with your concern - many RISCs, especially those with register 
>windows, have greatly increased context switch penalties.
>
>
>Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
>    aglew@gould.com     	- preferred, if you have MX records
>    aglew@xenurus.gould.com     - if you don't
>    ...!ihnp4!uiucuxc!ccvaxa!aglew  - paths may still be the only way

	This is not always true, however.  The Celerity processor
	maintains multiple register windows.  Each bank of the 
	register window set is associated with a process.  Context
	switching is often faster than more traditional processors.

	Another solution to this problem is flushing and filling the
	register file in the background. Often context switching is
	a laborious enough procedure that this type of operation can
	proceed in parallel with the other housekeeping that has to
	be done.  I don't know of any RISC designs that do this.

	I'm not disagreeing, just pointing out the alternatives.


					Joe Hollinger