[net.arch] What's RISC all about ... REALLY?

gooley@uicsl.UUCP (07/05/86)

My guess is that the polynomial-evaluation instructions have some hardware
support on the bigger VAXen (my unreliable memory tells me that the VAX
FPA provides some in addition to doing the FP arithmetic), and so might be
faster than the unrolled-loop user-programmed versions.  On the 750, such
instructions are done in microcode -- but that doesn't explain why they are
so very slow.  DEC would have to provide them for compatibility with machines
on which they make more sense...

!uiucdcs!uicsl!gooley

ddb@starfire.UUCP (David Dyer-Bennet) (07/24/86)

> In article <526@mips.UUCP>, mash@mips.UUCP (John Mashey) writes:
> ....................................................  Now the shock; the
> unrolled version was *faster*.  I forget how much, but it was enough  to
> offset the extra code space required by far.
> 
>      Explanation anyone?  Especially someone who knows VAX/750 microcode
> details?
> -- 
> 					der Mouse
> 
Can't claim knowledge of 750 microcode details, but I was rather interested
to discover when I looked into some things conveniently to hand that in
MOST cases a complex instruction runs slower than doing the same work on the
same processor with simple instructions.  It's also true of the INDEX
instruction on VAX, and of the LDB and string copy and edit instructions
on the PDP-10.  (instructions to simply move adjacent storage usually run
fast, as in the VAX MOVC3 or the Intel rep/movsw).

In the one case I could analyze at the microcode level, which may not be at
all typical, I finally decided that the problem was that all the fancy
provisions to make loops run fast, overlap things, etc., only worked above
the microcode level, so when you did complex stuff in microcode it didn't
get the assists.

Don't know if this is really a general rule, I haven't looked at this in
all that many architectures.

		-- David Dyer-Bennet
		Usenet:  ...ihnp4!umn-cs!starfire!ddb
		Fido: sysop of fido 14/341, (612) 721-8967
		Telephone: (612) 721-8800
		USmail: 4242 Minnehaha Ave S
			Mpls, MN 55406

ronc@fai.UUCP (Ronald O. Christian) (07/29/86)

In article <258@starfire.UUCP> ddb@starfire.UUCP (David Dyer-Bennet) writes:
>> In article <526@mips.UUCP>, mash@mips.UUCP (John Mashey) writes:
>> ....................................................  Now the shock; the
>> unrolled version was *faster*. [...]
>Can't claim knowledge of 750 microcode details, but I was rather interested
>to discover when I looked into some things conveniently to hand that in
>MOST cases a complex instruction runs slower than doing the same work on the
>same processor with simple instructions.  It's also true of the INDEX
>instruction on VAX, and of the LDB and string copy and edit instructions
>on the PDP-10.

I'm usually just a "listener" on this newsgroup, but if I understand what
you're saying above, I'm a little surprised that this is surprising.  (Or
something like that.)  The first group I ever worked with was doing a lot
of Z80 hacking for a control application, and we had to work under *very*
stiff memory constraints.  Typically we wrote the code first on a development
system with lots of memory, then started to pare away at it to get it to
fit in memory.  But a funny thing happened:  The programs started running
faster.  We were typically getting rid of things like use of the index
(IX, IY) registers, add and subtract functions, and so forth in favor of
indirect addressing and boolean operations because the instructions used
fewer bytes.  We also used loops whenever possible and this of course
increased overhead, but in general "thin" code ran faster.  It ran so much
faster that we started writing thin code even if there was not a byte
advantage.  You could say we were using a RISC sub-set of the Z80 code set.

It occurs to me that even though a complex instruction on a Vax does more
per byte of opcode than a simple instruction, the Vax is "hiding" from
you the fact that the complex instruction is really sort of a macro,
telling the computer to execute lots and lots of microcode.  When the
cost of the microcode exceeds the cost of fetching the original instruction,
the instruction becomes more costly than it's counterpart coded out of
simple instructions.

Hmmm.  It just occurred to me that the speed of main memory is a big
factor in the opcode/microcode trade-off.  Could Vaxen have been designed
with much slower main memory in mind?  (Or faster cache?)


			Ron
-- 
--
		Ronald O. Christian (Fujitsu America Inc., San Jose, Calif.)
		seismo!amdahl!fai!ronc  -or-   ihnp4!pesnta!fai!ronc

Oliver's law of assumed responsibility:
	"If you are seen fixing it, you will be blamed for breaking it."