[net.arch] RISC vs VAX

stern@inmet.UUCP (11/17/84)

In response to "Is is possible to build a RISC in the same class as a VAX?"
I'd have to say the answer is yes and no.  If you want to talk about integer
computations, procedure entry/exit, and data movement, by all means yes.
In theory, it would be possible to design and implement a RISC with a
performance in the same class as a VAX -- meaning only that the RISC could
shuffle data, access memory with a *few* addressing modes (direct and register
indirect are probably the easiest) and zip through some integer computation   
code faster than a VAX 780.  RISC architecture is not suited for floating
point operations, or matrix multiplications, since the arithmetic steps
produce a very tight bottleneck.  In my somewhat young opinion, a RISC is
ideal for LISP hacking -- most of what you are doing is searching, matching,
and chasing pointers.  In this application, a RISC beats a VAX hands-down.

The answer to the original question is also "no," due to the problem of
i/o on a RISC.  Attaching an ACIA (or similar character-oriented i/o device)
to the RISC could be a big difficulty -- the cycle times of the two devices
are different by one or two orders of magnitude.  You can either halt the RISC
while doing character i/o (boo hiss) or use the RISC as an attached processor:
plug it into a spare UNIBUS slot and let a VAX share its memory.  The i/o
problem is then solved: whatever OS you have running on the VAX does the
i/o, and the RISC does the nasty computation.  Perhaps a better question 
would be "Do you want to run a RISC with the power of a VAX stand-alone?"
I think it might be better to plug eight or sixteen RISC boards into a VAX,
rehack UNIX to download microcode for appropriate tasks, and thereby 
save VAX cycles for cycle-hungry tasks.  Example:  Write a search routine
to get downloaded onto a RISC.  Put the relevant data in the common VAX/RISC
memory and let the RISC rip.  Mr. VAX goes and deals with somebody else's
process, and the search job finishes *faster* than if the VAX had done it.
This isn't just a marvel of parallel processing -- the RISC really would
complete the task faster than a VAX could.  They just can perform *every*
task faster than a VAX.


	Hal Stern
	Intermetrics, Inc
	{ihnp4, harpo, esquire, ima}!inmet!stern

henry@utzoo.UUCP (Henry Spencer) (11/20/84)

> ...  RISC architecture is not suited for floating
> point operations, or matrix multiplications, since the arithmetic steps
> produce a very tight bottleneck.  ...

Why?  It is true that the *existing* RISC machines have no floating-point
support, but then neither does the 68000.  This does not imply that one
cannot build an effective floating-point-crunching system around either.

> ...[also, a RISC can't be a VAX] due to the problem of
> i/o on a RISC.  Attaching an ACIA (or similar character-oriented i/o device)
> to the RISC could be a big difficulty -- the cycle times of the two devices
> are different by one or two orders of magnitude.  ...

Uh, Hal, the cycle times of a 780 and an ACIA are also a little bit
different...  So you solve the problem the same way as on the VAX:  if
you want to pump lots of data through the i/o system, the i/o system has
to do some of the work for you.  Actually, this is roughly what your
suggested solution amounts to, except that a VAX cpu is an inordinately
expensive -- and not all that speedy -- i/o processor.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

stern@inmet.UUCP (11/29/84)

[got the asbestos jumpsuit on?]

(1) I said that RISCs are not suited for floating point/matrix multiplication
    because: (a) attaching a floating point coprocessor to a RISC perverts
    the very idea of a reduced instruction set; and (b) the original question
    asked if a RISC could be produced in the same class as a VAX.  No RISC
    can perform every function of a 780 faster than a 780 because there are
    tradeoffs made when you give up instruction set complexity for speed.
    Usually floating point stuff is the first to go -- except when you make
    a purely floating-point processor, in which case everything has been
    given up.  You can't have your floating point cake and eat it too.
     
(2) Yes, an ACIA and a VAX have different cycle times.  So do a washing
    machine and an ECL flip-flop.  But if you build a fancy interface to
    the ECL flip-flops, the washing machine still won't be able to talk
    to them.  Having a nice DZ-11 multiplexor on a VAx is great -- it provides
    intelligent communications processing, so the VAX doesn't burn cycles
    doing low-level ACIA handshaking.  Now how could a RISC talk to a DZ?
    Letting them share a memory space might be the best thing, but gee,
    then you might need another processor to arbitrate memory accesses, and
    then it starts to look like plugging a RISC into a VAX again.  My point
    is that if you want a RISC to do i/o *without another processor* then
    you have to add instructions/wait states/more complexity to get something
    with a 50-nanosecond cycle time to talk to a device with a 1-microsecond
    cycle time.  Comparing a RISC to a VAX in terms of performance, when the
    RISC has two or three half- or quarter-VAXen doing its i/o, is cheating.
    Hey, my Toyota can outrace the space shuttle when you strap two solid
    rocket boosters to it.

--Hal Stern

{ihnp4, harpo, esquire, ima}!inmet!stern

guy@rlgvax.UUCP (Guy Harris) (11/30/84)

> (1) I said that RISCs are not suited for floating point/matrix multiplication
>     because: (a) attaching a floating point coprocessor to a RISC perverts
>     the very idea of a reduced instruction set; and (b) the original question
>     asked if a RISC could be produced in the same class as a VAX.  No RISC
>     can perform every function of a 780 faster than a 780 because there are
>     tradeoffs made when you give up instruction set complexity for speed.
>     Usually floating point stuff is the first to go -- except when you make
>     a purely floating-point processor, in which case everything has been
>     given up.  You can't have your floating point cake and eat it too.

Anybody have any data on how a "strict RISC" (i.e., one cycle per instruction,
which rules out multiply unless you have a parallel multiplier and rules
out divide unless somebody has a non-iterative divide algorithm) does on
jobs requiring a lot of multiplies and divides (which it has to do with
library routines)?  If it can do them as well as, say, an 11/780, it might be
able to do software floating point as well also.  Basically, think of RISC
instructions as vertical microcode; if you can do it fast with vertical
microcode, you can do it fast on a RISC as long as you can avoid going
to memory for the instructions every time.

Also, attaching a floating point coprocessor doesn't "pervert the very idea
of a reduced instruction set".  If you can't do FP fast on a RISC, there's
no need to go to instruction set complexity except for the floating-point
instructions.  A RISC with an FP chip might still be faster than a
(more expensive, in cost or chip real-estate or whatever) CISC.  I believe
a lot of the reason for a RISC is that in non-number-crunching applications
a lot of the cycles go to "bureaucratic overhead" (moving things from memory/
registers into other memory/registers, testing things, etc.) and that a RISC
may be able to deal with that as well as or better than a CISC.  Number-
crunching may have a different character (although I remember a study by
Knuth of FORTRAN programs that found that a surprisingly large number of
lines were exactly that kind of "bureaucratic overhead" - "j = j + 1" counts
as overhead if the only purpose "j" serves is to point to an element in an
array full of the numbers you're crunching); if it does, maybe a "different
horses for different courses" approach would be more cost-effective.

> (2) Yes, an ACIA and a VAX have different cycle times.  So do a washing
>     machine and an ECL flip-flop.  But if you build a fancy interface to
>     the ECL flip-flops, the washing machine still won't be able to talk
>     to them.  Having a nice DZ-11 multiplexor on a VAx is great -- it provides
>     intelligent communications processing, so the VAX doesn't burn cycles
>     doing low-level ACIA handshaking.  Now how could a RISC talk to a DZ?
>     Letting them share a memory space might be the best thing, but gee,
>     then you might need another processor to arbitrate memory accesses, and
>     then it starts to look like plugging a RISC into a VAX again.

What?  Why couldn't you plug a DZ (which, by the way, hardly provides
"intelligent communications processing" in the conventional sense of the
word) to a RISC?  Are you worried about the DZ not being able to do a
bus cycle in a reasonable amount of time, and tying the machine up twiddling
its thumbs waiting for the DZ to respond so it can finish the "move" instruction
that's stuffing the character into the DZ's output buffer register?
Presumably you can build a bus interface chip that's as fast as the RISC,
and if your bus is fast enough it won't tie the RISC up for a long time
(if it isn't fast enough, it'll tie up any fast machine up too long, not
just a RISC).  Equating a DZ and a bus interface to a VAX-11 in complexity
is bogus; the UBA on a VAX may be big, but then it does a hell of a lot
more than just act as a bus interface, given its multiple data paths, and
maps, and so forth and so on.

>     Comparing a RISC to a VAX in terms of performance, when the
>     RISC has two or three half- or quarter-VAXen doing its i/o, is cheating.

Who says it requires a half- or quarter-VAX to do the I/O for a RISC?

>     Hey, my Toyota can outrace the space shuttle when you strap two solid
>     rocket boosters to it.

Not a bad analogy; if your goal is met better by the Toyota+rocket boosters
than by the space shuttle, why not go with it?  If you're trying to break
the land speed record, you build a land speed record car, not a jet plane
with detachable wings and landing gear that can handle 700+ MPH.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

phil@amdcad.UUCP (Phil Ngai) (12/01/84)

> --Hal Stern
>     is that if you want a RISC to do i/o *without another processor* then
>     you have to add instructions/wait states/more complexity to get something
>     with a 50-nanosecond cycle time to talk to a device with a 1-microsecond
>     cycle time.

Give me a break. Haven't you ever heard of wait states, interrupts, or
DMA? Did you know that an 8 MHz 8086 can't talk to a 4 MHz 8530 without
wait states? No, I didn't need half a VAX to interface the 8086 to the
8530. What is your problem anyway?
-- 
 I'm not a programmer, I'm a hardware type.

 Phil Ngai (408) 749-5790
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.ARPA

padpowell@wateng.UUCP (PAD Powell) (12/02/84)

> (1) I said that RISCs are not suited for floating point/matrix multiplication
>     because: (a) attaching a floating point coprocessor to a RISC perverts
>     the very idea of a reduced instruction set; and (b) the original question

So what?  The concept of a RISC also violated the CISC aesthetic ideals.
If you want to argue that the addition of specialized hardware to perform
an algorithm unsuited for Von Neuman based architectures is an unsuitable
addition to a RISC architecture,  present proof, either mathematical,
or physical (i.e.- run an experiment), but please do not appeal to "good taste."

Patrick ("I'd put an FPA on my carburetor if it gave me better milage") Powell