[comp.arch] WISC

cquenel@polyslo.CalPoly.EDU (24 more school days) (05/04/89)

In 9680 yair@tybalt.caltech.edu.UUCP (Yair Zadik) sez:
>A couple of years ago there was an article in Byte about a proposed design
>which they called WISC for Writeable Instruction Set Computer.  The idea
>was to do a RISC or microcoded processor which had an on board memory 
>containing macros which behaved like normal instructions (I guess it was
>on EEPROM like memory).  That way, each compiler could optimize the 
>instruction set for its language.  The end result (theoreticly) is that
>you get the efficiency of RISC with the memory bandwith of CISC.  I haven't
>heard else about it.  Is anyone out there working on such a processor or is
>it just a bad idea?
>
>Yair Zadik
>yair@tybalt.caltech.edu

I believe the Clipper CPU chip does this, but it is not setable for each
process.  It is fixed for a given chip.  This set of built-in subroutines
was used for testing, some floating point (I think), and whatever code
they thought they didn't want to have to come through the i-cache.

As has already been pointed out, doing this on a process specifiable
level would be a hassle for context switching.  If you already have
an icache and a dcache on-chip, you could have a micro-code cache
as well.  If you didn't worry about "mixing" them, you could have
room for 2 or 4 sets of micro-code and have each set tagged with the
process ID in some way.  You could either fault each set of micro-code in 
in chunks, or do the whole thing at one time.

Another alternative would be to have a fixed number of sets of
micro-code, tailored for (for example)

	The kernel : with micro-code routines for
		     process management (load/store process context)
		     virtual memory management (cache flushing etc)

	pascal	   : maybe a different calling sequence

	fortran	   : with more tolerant aligment for handling
		     COMMMON block difficulties.

	LISP	   : get ideas from the bujillion micro-coded
		     LISP-stations on the market.

	C	   : be sure to put those nasty str* routines
		     in micro-code !


	etc etc etc


Anyway, you could have software traps to replace micro-code
that wasn't configured into the system, with slower conventional routines.

The site could decide which sets of micro-code they wanted to install
at kernel-link time.

Well, what do people think ?

I know it's a lot of trouble, but you could relieve a lot of the pressure
that RISC puts on the icache (and consequently the memory bus).

--chris

All I ask of life is a constant and exaggerated sense of my own importance.
-- 
@---@  -----------------------------------------------------------------  @---@
\. ./  | Chris (The Lab Rat) Quenelle      cquenel@polyslo.calpoly.edu |  \. ./
 \ /   |  You can keep my things, they've come to take me home -- PG   |   \ / 
==o==  -----------------------------------------------------------------  ==o==

bcase@cup.portal.com (Brian bcase Case) (05/05/89)

>Well, what do people think [about WISC]?

>I know it's a lot of trouble, but you could relieve a lot of the pressure
>that RISC puts on the icache (and consequently the memory bus).

Oh, OH, OH MY GOD, my instruction cache is about to BLOW UP from all that
pressure!!  It's gonna blow!  RUN FOR YOUR LIVES!!  [Oh come on, it's just
a joke.  I'm not flaming.]

The point I want to make is that if RISC (or anything else) keeps the 
instruction cache 100% busy, THAT IS GOOD, NOT BAD.  This means that the
trouble spent building the damn thing was really worth it, to say it one
way.  This is not "pressure."  The fact is that instruction caches work
really well.  There is no need to "help them out" or "relieve a lot of
the pressure put on them."  People seem to think that cache designs and
bus designs are right at the breaking point.  We're just lucky that we
were able to build satisfactory caches and buses for this generation of
machines, but, "Oh m'God, what about the next generation?  There's just
nothing left, we've run out of trickery!"  This is not true.  Cache and
bus designs are right at the edge of what is needed for a particular
generation of machines *BY DESIGN*.  To build something radically beyond
what is needed is a waste of money, and designers know it.  Sure, this
is hard stuff at 50 MHz and beyond.  But to think that we need to
go back to CISC or soft machines as a solution is probably not right.
To me, arguing for CISC in the face of high-frequency implementation
problems is equivalent to saying:  "These machines is runnin' too fast;
what say let's slow 'em down agin."

David Letterman:  "We are very near the end of civilization."

koopman@a.gp.cs.cmu.edu (Philip Koopman) (05/05/89)

In article <10978@polyslo.CalPoly.EDU>, cquenel@polyslo.CalPoly.EDU (24 more school days) writes:
> In 9680 yair@tybalt.caltech.edu.UUCP (Yair Zadik) sez:
> >A couple of years ago there was an article in Byte about a proposed design
> >which they called WISC for Writeable Instruction Set Computer.  The idea
> >was to do a RISC or microcoded processor which had an on board memory 
> >containing macros which behaved like normal instructions (I guess it was
> >on EEPROM like memory).  That way, each compiler could optimize the 
> >instruction set for its language.  The end result (theoreticly) is that
> >you get the efficiency of RISC with the memory bandwith of CISC.  I haven't
> >heard else about it.  Is anyone out there working on such a processor or is
> >it just a bad idea?
> >Yair Zadik
> >yair@tybalt.caltech.edu

WISC Technologies built two such processors, a 16-bit and a 32-bit processor.
The Byte article described the 16-bit processor in somewhat generic terms.
The technology has since been licensed to Harris Semiconductor, and forms
the basis for their 32-bit RTX (Real Time Express) chip now in development.

> As has already been pointed out, doing this on a process specifiable
> level would be a hassle for context switching.  If you already have
> an icache and a dcache on-chip, you could have a micro-code cache
> as well.  If you didn't worry about "mixing" them, you could have
> room for 2 or 4 sets of micro-code and have each set tagged with the
> process ID in some way.  You could either fault each set of micro-code in 
> in chunks, or do the whole thing at one time.

The original processors used RAM control store and required a host processor.
Real-live chips that could be used in a stand-alone mode are being built
that have ROM for a core instruction set and RAM for application-specific
instructions.  Since the primary application is real-time embedded control,
it is *not* important to worry about managing the control store RAM, since
the programmer gets to determine the instruction set at compile time.
This is not at all like a multi-user workstation environment, where contention
for a limited amount of control store by several large programs written
by different companies can be a problem.

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  5551 Beacon St.
  Pittsburgh, PA  15217    
Recent PhD graduate at CMU and sometime consultant to Harris Semiconductor.
I speak only for me, etc.  But, I am the one who wrote the Byte Article.
--

kds@blabla.intel.com (Ken Shoemaker) (05/06/89)

I think the problem isn't one to get solved by downloadable microcode.  As
discussed previously, this is really almost the as just having a risc with a
good high speed instruction cache.  Complex instructions implemented just a
branches into the high speed downloadable cache don't really go any faster.

However, if you have complex instructions, then this really is low hanging
fruit in terms of superscaler implementations because a single instruction
specifies multiple things to do.  If you can do all these things in a single
clock, voila.  However, solving this problem requires adding more functional
blocks, e.g., adders, address paths, register ports, etc. to the 
implementation.
------------
I've decided to take George Bush's advice and watch his press conferences
	with the sound turned down...			-- Ian Shoales
Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|pur-ee|hacgate|oliveb}!intelca!mipos3!kds

schow@bnr-public.uucp (Stanley Chow) (05/08/89)

In article <17933@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes:
>>Well, what do people think [about WISC]?
>
>>I know it's a lot of trouble, but you could relieve a lot of the pressure
>>that RISC puts on the icache (and consequently the memory bus).
>
>The point I want to make is that if RISC (or anything else) keeps the 
>instruction cache 100% busy, THAT IS GOOD, NOT BAD. 

Actually, it is bad.

Keeping the i-cache 100% busy means it is the bottle neck, so the original
argument applies.
Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public
I am just a small cog in a big machine. I don't represent nobody.

bcase@cup.portal.com (Brian bcase Case) (05/08/89)

>>The point I want to make is that if RISC (or anything else) keeps the 
>>instruction cache 100% busy, THAT IS GOOD, NOT BAD. 
>
>Actually, it is bad.
>Keeping the i-cache 100% busy means it is the bottle neck, so the original
>argument applies.

If 100% busy means that the instruction cache is always missing, yes, it
is the bottleneck.  If 100% busy means that the instruction cache is
delivering an instruction to the processor every cycle, no it is not the
bottleneck.  Assuming the processor can execute 1 instruction per cycle and
the instruction cache is delivering 1 instruction per cycle (a safe
assumption for a RISC machine with a Harvard implementation), the
instruction cache is NOT the bottleneck.  The instruction cache and
processor are working at their maximum rates, a fact which indicates a
good design.

I thought it was clear that by 100% busy, I meant delivering an instruction
to the processor on every cycle (or very nearly so, which is the case for
most RISC machines).

Similarly, if the processor's instructions specify two source registers and
one destination register and the register file can source two operands and
sink one operand, the register file is not the bottleneck, even for a long
sequence of arithmetic/logical/shift ops in which the register file is kept
100% busy.