[comp.arch] VAX Architecture

usenet@cps3xx.UUCP (Usenet file owner) (06/07/89)

in article <76700071@p.cs.uiuc.edu>, gillies@p.cs.uiuc.edu says:
$ Nf-ID: #R:mipos3.intel.com:182:p.cs.uiuc.edu:76700071:000:1271
$ Nf-From: p.cs.uiuc.edu!gillies    Jun  5 16:33:00 1989
$ 
$ History's greatest mistakes give rise to history's greatest successes.
$ When there is failure (Multics, VAX architecture, i8086) it causes
$ fanatics to go 180 degrees in the other direction, often with good
$ results (Unix, RISC, Awesome 8x86 optimizing compiler technology).

Out of curiousity, and not trying to start a new religious war, what
about the VAX architecture do you consider a failure?

John H. Lawitzke           UUCP: Work: ...rutgers!mailrus!frith!dale1!jhl
Dale Computer Corp., R&D               ...uunet!frith!dale1!jhl
2367 Science Parkway             Home: ...uunet!frith!ipecac!jhl      
Okemos, MI, 48864          Internet:   jhl@frith.egr.msu.edu

preston@titan.rice.edu (Preston Briggs) (06/08/89)

>$ History's greatest mistakes give rise to history's greatest successes.
>$ When there is failure (Multics, VAX architecture, i8086) it causes
>$ fanatics to go 180 degrees in the other direction, often with good
>$ results (Unix, RISC, Awesome 8x86 optimizing compiler technology).
>
>Out of curiousity, and not trying to start a new religious war, what
>about the VAX architecture do you consider a failure?

or ``Awesome 8x86 optimizing compiler technology'' ??
Are there any useful optimizations possible on an 8086?

I can imagine constant propagation/folding and expression
rearrangement (to minimize register pressure).
Generally, there aren't enough registers to hold any
common subexpressions or loop invarients or variables.
Perhaps the architecture motivated some of the work on
automatic generation of code generators, especially for
instruction selection; but I think the VAX was more important.
(I'm thinking of Graham and Henry).

On the other hand, I think Turbo Pascal suprised a lot of people,
showing how fast a compiler could be on on a relatively
small/slow machine.  No new optimization work though.
Nowadays, I'm impressed with the speed of Silicon Valley Software
(SVS) based compilers.

Regards,
Preston Briggs

davidsen@sungod.crd.ge.com (William Davidsen) (06/08/89)

In article <3474@kalliope.rice.edu> preston@titan.rice.edu (Preston Briggs) writes:

| or ``Awesome 8x86 optimizing compiler technology'' ??
| Are there any useful optimizations possible on an 8086?

  All the same stuff as any other CPU... taking things out of loops,
common subexpressions (push and pop are faster than recalc for many
things, other than simple register to register ops.

  One big win is loop unrolling (the Duff device works really well on an
80x6) because of the prefetch queue. Loops pay a penalty of restarting
the queue, while inline code runs full speed, in many cases, even on a
machine with slow memory and wait states. The [23]86 fetch more bytes
and therefore have fewer memory accesses/byte.
	bill davidsen		(davidsen@crdos1.crd.GE.COM)
  {uunet | philabs}!crdgw1!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

bcase@cup.portal.com (Brian bcase Case) (06/09/89)

>$ When there is failure (Multics, VAX architecture, i8086) it causes
>
>Out of curiousity, and not trying to start a new religious war, what
>about the VAX architecture do you consider a failure?

I'm not the original poster, but ....  The VAX is certainly not a
commercial failure.  However, its instruction encodings are an
abomination because they force a serial instruction decode.  If it
takes 3 cycles just to decode an instruction, your only chance of
achieving high performance is to speed up the clock.  But if you
can speed up the clock, then a machine with easier instruction
decode will beat you....  Someday DEC will implement a parallel
instruction decoder along the lines of the one used in the 486 (but
much more complex).  Then they will be able to lower the CPI of
the VAX.  However, everyone else will use the implementation
resources for more productive features....

melvin@ji.Berkeley.EDU (Steve Melvin) (06/09/89)

In article <19255@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes:
>Someday DEC will implement a parallel
>instruction decoder along the lines of the one used in the 486 (but
>much more complex).  Then they will be able to lower the CPI of
>the VAX.

You're ignoring the most obvious solution: a decoded instruction cache.
With this, the time to decode becomes an instruction cache miss effect
and in the case of a hit, an entire instruction (or more) can be pulled
out of the cache in a single cycle.  Decoding should still be fast, but
with a large enough cache, it's not as critical.  (Now the problem becomes
how to keep those darn REI's from flushing the cache, ... how about a
new non-flushing REI instruction?)

----
Steve Melvin
melvin@ji.Berkeley.EDU					...!ucbvax!melvin
----

slackey@bbn.com (Stan Lackey) (06/10/89)

In article <19255@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes:
>>Out of curiousity, and not trying to start a new religious war, what
>>about the VAX architecture do you consider a failure?
>
>I'm not the original poster, but ....  The VAX is certainly not a
>commercial failure.  However, its instruction encodings are an
>abomination because they force a serial instruction decode.  If it
>takes 3 cycles just to decode an instruction, your only chance of
>achieving high performance is to speed up the clock.  But if you

You statement is correct for the 11/730 but incorrect for all the
other VAXes.  Even the 8200 chip set parses many combinations of
opcodes and specifiers in parallel; it is true that it could be done
better, just like anything that exists could have been done better.
It most certainly does not take three cycles to parse the opcode; at
most one cycle for the opcode, and one each per operand specifier.  In
many cases, opcode and first specifier, or even opcode and first two
specifiers, and parsed in parallel.

The CPI's you may have seen for the 11/780 were probably misleading,
depending upon what the reader was "looking for".  For example, three
of the average cycles-per-instruction are due to address translation
and cache misses.  NTS, all computers feel this one.  Note that the
'average' may not make much sense anyway; many instructions (the "RISC
subset") really only take a few cycles.  It's just that enough of the
multi-cycle ones are executed, and when they happen, take a long time,
and increase the average by a lot.

Early implementations also suffered by not havine enough bandwidth on
writes, and that mixed with instructions that have burst writes, added
maybe one cycle to the average CPI.  The "best" instruction set
encodement in the world does not fix this.
-Stan