[comp.arch] RISC vs CISC on Low-End Processors

crowl@cs.rochester.edu (Lawrence Crowl) (05/10/88)

In article <492@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:
>Widgets in plastic boxes don't pass FCC at high clock rates without a lot of
>layout headaches and extraneous R's and C's.  Keep the clock down around 4
>Mhz!  The bus width is a big deal.  We're talking parts count here.  Ideally:
>1 CPU, 1 RAM, 1 ROM, 1 Glue, and peripheral chips.  The 16 bit bus gets the
>nod only because RAM/ROM requirements exceed current state of the art in
>RAM/ROM chips.  Go beyond 128K ROM and you're talking two chips.  Go beyond
>32K (static) RAM and you're also talking two chips. I'm looking at a bunch of
>consumer type applications that have outgrown the 8088 level of CPU
>performance, and are moving into the 64K to 128K bytes range of RAM, and the
>256K to 512K bytes range of ROM.  

Let me point out the differences between on low-end processors outlined above
and the typical RISC processors.

1) The clock rate is limited.  This means that instruction execution rate is
   also limited.
2) The memory bandwidth is limited.  This is a consequence of a low clock, a
   a narrow bus, cheap memory, and no external caches.
3) Program space is limited.  Programmers will always be asked to cram as much
   function as possible in minimal hardware.
4) Data space is limited.
5) The processor chip area is limited.  Processor cost is related to area, so
   cheaper processor have smaller areas.

Point 1 argues for maximum work per clock cycle.  Both CISC and stack
architectures provide noticably lower work per clock cycle than RISC
architectures.   

Points 2 and 3 argue for a densely coded instruction set.  This reduces both
the memory bandwidth required to execute the program and the space required to
store it.  Both CISC and stack architectures generally provide dense
instruction sets, RISC processors generally do not.   

Points 2 and 4 indicates that the processor should support variables of
multiple sizes ranging, perhaps even bit fields.  This is more an attribute of
CISC architectures than RISC.   

Point 5 indicates that complex or area consuming features should be avoided. 
RISC and stack architectures typically require much less area than CISC.  Here
we have a conflict, point 2 argues for internal caches, but point 5 argues
against them.

These points taken together seem to indicate that we want neither RISC nor
CISC, but the appropriate compromise.  The CRISP processor appears to have
addressed this compromise well.  I do not know enough about the architecture
to say whether or not it meets the requirements, but it appears much closer
than many other architectures.   
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

rick@pcrat.UUCP (Rick Richardson) (05/11/88)

In article <9561@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes:
> ...
>These points taken together seem to indicate that we want neither RISC nor
>CISC, but the appropriate compromise.  The CRISP processor appears to have
>addressed this compromise well.  I do not know enough about the architecture
>to say whether or not it meets the requirements, but it appears much closer
>than many other architectures.   

The last time we talked to the CRISP people, the processor price was
an order of magnitude too high, and there was a definite lean in
their attitude towards the high end.  Granted, this was before MIPSCo
and SPARC, and they may have come back to earth once they saw the
competition.

-- 
		Rick Richardson, President, PC Research, Inc.

(201) 542-3734 (voice, nights)   OR     (201) 834-1378 (voice, days)
uunet!pcrat!rick (UUCP)			rick%pcrat.uucp@uunet.uu.net (INTERNET)

koopman@a.gp.cs.cmu.edu (Philip Koopman) (05/11/88)

In article <9561@sol.ARPA>, crowl@cs.rochester.edu (Lawrence Crowl) writes:
> ...
> These points taken together seem to indicate that we want neither RISC nor
> CISC, but the appropriate compromise.  The CRISP processor appears to have
> addressed this compromise well.  I do not know enough about the architecture
> to say whether or not it meets the requirements, but it appears much closer
> than many other architectures.   

How about stack architectures?  They seem to meet the criteria you
set forth.  Does anyone have arguments for or against them?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  Phil Koopman             5551 Beacon St.             ~
~                           Pittsburgh, PA  15217       ~
~  koopman@faraday.ece.cmu.edu   (preferred address)    ~ 
~  koopman@a.gp.cs.cmu.edu                              ~
~                                                       ~
~  Disclaimer: I'm a PhD student at CMU, and I do some  ~
~              work for WISC Technologies.              ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

jk3k+@andrew.cmu.edu (Joe Keane) (05/12/88)

> How about stack architectures?  They seem to meet the criteria you set forth.
>  Does anyone have arguments for or against them?

I like them.  They tend to have very compact code (and more decoding), not the
trend these days.  A neat thing is that you can make the size of the register
file transparent.  Want to speed up the machine, add another 64 registers.  I
swear half the operands in VAX code are byte offset off frame pointer (although
GCC gets more into registers).

On a different note, how about the Fairchild Clipper?  Looks like about the
right compromise (for a high-end machine).  My biggest complaint is that they
don't have delayed branches `because it makes it hard on the compiler'.  The
current implementation should be able use at least one.

--Joe

crowl@cs.rochester.edu (Lawrence Crowl) (05/12/88)

In article <1658@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:
)In article <9561@sol.ARPA>, crowl@cs.rochester.edu (Lawrence Crowl) writes:
)>... These points taken together seem to indicate that we want neither RISC
)>nor CISC, but the appropriate compromise.  The CRISP processor appears to
)>have addressed this compromise well. ...
)
)How about stack architectures?  They seem to meet the criteria you
)set forth.  Does anyone have arguments for or against them?

Stack architectures fail to meet the criteria on one point.  They have a high
instruction execution rate.  The criteria required a low clock rate, which
does not support a high instruction rate.
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

johnw@astroatc.UUCP (John F. Wardale) (05/26/88)

In article <1658@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:
>How about stack architectures?  They seem to meet the criteria you
>set forth.  Does anyone have arguments for or against them?

People claim stack machines can give you fast execution and dense code.
I have two arguments against this:  (From "The Case Against Stack-Oriented 
Instruction Sets" G. Myers @ IBM (SIGCA-August 77) and other stuff.)

1: Code Size
	Several studies yield overwelming evidence that almost all code
	takes on of these three forms:
	1: a=b    2: a=a+b    3: a=b+c    (+ is an operation)

	Stack based code gains NOTHING in these cases.  Mem-to-mem code wins!
2: Speed (pipelining / schecualing)
	consider a = (b+c) * (e+f)  (push,push,+,push,push,+,*,pop)
	Since all the operands and results share 2 or 3 entries on the
	stack, its very hard (impossible?) to have any parallelism.
	(The third push must wait for the + to replace two operands 
	 with one result.)  

	With a Register-Based machine (RISC if you like) you have/use
	lots of registers to avoid these confiltcs (hazards).  In other
	words register (RISC) machines encourage scheduling; Stack 
	machines prevent it. (Schecualed REG code could be:
	4 load, +, +, *, store [two addes in parallel]

Scientific advances in SW and HW have provided us better ways of doing
thins,  These ways just happen to NOT be very compatable with stack
architectures.  (The Model T [car] was good in it's day, but they're
not used much any more.)

Comments welcome.  If you want to flame, flame yourself....my asbestoes
underware wore out last month.

-- 
					John Wardale
... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw

To err is human, to really foul up world news requires the net!

crowl@cs.rochester.edu (Lawrence Crowl) (05/27/88)

In article <1035@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>People claim stack machines can give you fast execution and dense code.
>I have two arguments against this:  (From "The Case Against Stack-Oriented 
>Instruction Sets" G. Myers @ IBM (SIGCA-August 77) and other stuff.)

The proposal for stack machines was in the context of low-end processors.  One
key feature of such a machine are that there is a low bandwidth to memory.

>Code Size: Several studies yield overwelming evidence that almost all code
>    takes on of these three forms: a=b  a=a+b  a=b+c  (+ is an operation)

You forgot a[i] and p->a which are compiled as expressions but do often appear 
in the "overwelming evidence" cited above.  I think that the Burroughs stack
machines compiled to half the size of their contemporaries.

>Speed (pipelining / scheduling): Since all the operands and results share 2 or
>    3 entries on the stack, it's very hard to have any parallelism.  With a
>    register machine you have lots of registers to avoid these conflicts.  In
>    other words register machines encourage scheduling; stack machines prevent
>    it.

The pipelining is moot when the memory bandwidth to the processor is low
enough so that the processor has spare time to process instructions.  Remember,
the processor is feeding off relatively slow memory.  Instruction scheduling
would only mean that the processor spent a larger fraction of its time waiting
on memory.  Hardly worth the effort.

>Scientific advances in SW and HW have provided us better ways of doing things,
>these ways just happen to NOT be very compatable with stack architectures.
>(The Model T [car] was good in it's day, but they're not used much any more.)

Advances have provided us with ALTERNATIVES, each more suited to certain
environments and technologies.  The stack machine was proposed in the context
of an environment different from that which you are evaluating it.  "Better"
is relative.

-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

ok@quintus.UUCP (Richard A. O'Keefe) (05/27/88)

In article <1035@astroatc.UUCP>, johnw@astroatc.UUCP (John F. Wardale) writes:
> 1: Code Size
> 	Several studies yield overwelming [sic] evidence that almost all
> 	code takes on of these three forms:
> 	1: a=b    2: a=a+b    3: a=b+c    (+ is an operation)
> 
> 	Stack based code gains NOTHING in these cases.  Mem-to-mem code wins!
(a) This may be true of Fortran and Pascal.  It is less true of C, and it is
    not true of functional languages, 
(b) I looked at a couple of dozen small chunks of Pascal-type code once
    (they were out of a Lisp interpreter, as it happens) and found that
    B6700 (stack) code was quite a bit denser than DEC-10 (reg-mem) code
    and, surprise, was denser than VAX-11 (mem-mem) code.  The one thing
    which would have improved the B6700 would have been fusing comparison
    instructions into branches.

mch@computing-maths.cardiff.ac.uk (Major Kano) (05/27/88)

In article <9561@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes:

(* stuff deleted *)

>These points taken together seem to indicate that we want neither RISC nor
>CISC, but the appropriate compromise.  The CRISP processor appears to have
>addressed this compromise well.  I do not know enough about the architecture
>to say whether or not it meets the requirements, but it appears much closer
>than many other architectures.   
>-- 
>  Lawrence Crowl		716-275-9499	University of Rochester
>		      crowl@cs.rochester.edu	Computer Science Department
>...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

   Sounds ok at first. Would anyone who can please remind me just what the CRISP
architecture is. I HAVE heard about it, but long ago. E-mail please, no need to
clutter up the net. If enough I'll summarise.

Thanks very much in advance;
regards,
-- 
Martin C. Howe, University College Cardiff | "C", adj; means  | I'm Motorhead; 
mch@vax1.computing-maths.cardiff.ac.uk.    | "write-only".    | Remember me now,
-------------------------------------------+------------------+ Motorhead;     
These opinions are mine, but YOU can have them for a few $$ ! | ALL RIGHT !

martin@felix.UUCP (Martin McKendry) (06/02/88)

In article <10074@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes:
>.ARPA> <1658@pt.cs.cmu.edu> <1035@astroatc.UUCP>
>Reply-To: crowl@cs.rochester.edu (Lawrence Crowl)
>Organization: U of Rochester, CS Dept, Rochester, NY
>Lines: 40
>
>In article <1035@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>>People claim stack machines can give you fast execution and dense code.
>>I have two arguments against this:  (From "The Case Against Stack-Oriented 
>>Instruction Sets" G. Myers @ IBM (SIGCA-August 77) and other stuff.)
>
>The proposal for stack machines was in the context of low-end processors.  One
>key feature of such a machine are that there is a low bandwidth to memory.
>
>>Code Size: Several studies yield overwelming evidence that almost all code
>>    takes on of these three forms: a=b  a=a+b  a=b+c  (+ is an operation)
>
>You forgot a[i] and p->a which are compiled as expressions but do often appear 
>in the "overwelming evidence" cited above.  I think that the Burroughs stack
>machines compiled to half the size of their contemporaries.

I spent almost the entirety of 1986 working out of Burroughs World
Headquarters in Detroit, with the express purpose of evaluating such
claims as this and others, in the great 'stack machine' (Burroughs-style)
vs. current technology debate.  Rest assured that, whether or not the claim
of smaller code size was true at some time in the past (1958?) it is not
true today.  We did experiments to compare code size vs. IBM 360 etc
instructions, VAX instructions, and MIPS instructions.  In most cases
we compared Fortran, 'scientific' (Dhrystone-style) code, and Cobol.
In no case that I recall did the Burroughs instructions win by any
margin (if at all).  Of course, the ad-hoc, recursive descent, non-
optimizing Burroughs compilers might have had something to do with
it.

In fact, if there was every anything that the particular stack
architecture did better, the advantage was lost by the start of
the 1970's.  I suspect the only thing it ever did better was 'virtual
memory'.  But this was at huge cost, because the 'descriptors' (pointers
to segments/pages) contained the page presence bits, so you could not
optimize references through them.  (The hardware won't let you anyway.)
This led to indirection chains of great length that were unoptimizable
by software.  Once IBM came along with their paging, Burroughs machines 
were slower, took more code space, and cost more to build than competitive
machines.   

--
Martin S. McKendry;    FileNet Corp;	{hplabs,trwrb}!felix!martin
Strictly my opinion; all of it

stevew@nsc.nsc.com (Steve Wilson) (06/03/88)

In article <2838@louie.udel.EDU> rminnich@udel.EDU (Ron Minnich) writes:
>   I realize now in hindsight that the company was slowly withdrawing
>resources from the divisions (like mine) that built stack machines. 
>It looks like a good decision to me ....
>
>-- 
>ron (rminnich@udel.edu)
I don't want to start any religious wars about comparing a Burroughs
MCP against IBM's JCL that was available in the late 1970's.  But 
I didn't find it that hard to use, and they did support multi-programming
in a fairly robust way per my experience. 

As for Burroughs(aka Unisys) not being interested in stack machines 
anymore, well they sure seem to be concentrating pretty hard on the
A-series boxes.  Last time I checked, this series was stack based. 

Steve Wilson
National Semiconductor

[ Universal disclaimer goes here! ]