usenet@cps3xx.UUCP (Usenet file owner) (06/07/89)
in article <76700071@p.cs.uiuc.edu>, gillies@p.cs.uiuc.edu says:
$ Nf-ID: #R:mipos3.intel.com:182:p.cs.uiuc.edu:76700071:000:1271
$ Nf-From: p.cs.uiuc.edu!gillies Jun 5 16:33:00 1989
$
$ History's greatest mistakes give rise to history's greatest successes.
$ When there is failure (Multics, VAX architecture, i8086) it causes
$ fanatics to go 180 degrees in the other direction, often with good
$ results (Unix, RISC, Awesome 8x86 optimizing compiler technology).
Out of curiousity, and not trying to start a new religious war, what
about the VAX architecture do you consider a failure?
John H. Lawitzke UUCP: Work: ...rutgers!mailrus!frith!dale1!jhl
Dale Computer Corp., R&D ...uunet!frith!dale1!jhl
2367 Science Parkway Home: ...uunet!frith!ipecac!jhl
Okemos, MI, 48864 Internet: jhl@frith.egr.msu.edu
preston@titan.rice.edu (Preston Briggs) (06/08/89)
>$ History's greatest mistakes give rise to history's greatest successes. >$ When there is failure (Multics, VAX architecture, i8086) it causes >$ fanatics to go 180 degrees in the other direction, often with good >$ results (Unix, RISC, Awesome 8x86 optimizing compiler technology). > >Out of curiousity, and not trying to start a new religious war, what >about the VAX architecture do you consider a failure? or ``Awesome 8x86 optimizing compiler technology'' ?? Are there any useful optimizations possible on an 8086? I can imagine constant propagation/folding and expression rearrangement (to minimize register pressure). Generally, there aren't enough registers to hold any common subexpressions or loop invarients or variables. Perhaps the architecture motivated some of the work on automatic generation of code generators, especially for instruction selection; but I think the VAX was more important. (I'm thinking of Graham and Henry). On the other hand, I think Turbo Pascal suprised a lot of people, showing how fast a compiler could be on on a relatively small/slow machine. No new optimization work though. Nowadays, I'm impressed with the speed of Silicon Valley Software (SVS) based compilers. Regards, Preston Briggs
davidsen@sungod.crd.ge.com (William Davidsen) (06/08/89)
In article <3474@kalliope.rice.edu> preston@titan.rice.edu (Preston Briggs) writes: | or ``Awesome 8x86 optimizing compiler technology'' ?? | Are there any useful optimizations possible on an 8086? All the same stuff as any other CPU... taking things out of loops, common subexpressions (push and pop are faster than recalc for many things, other than simple register to register ops. One big win is loop unrolling (the Duff device works really well on an 80x6) because of the prefetch queue. Loops pay a penalty of restarting the queue, while inline code runs full speed, in many cases, even on a machine with slow memory and wait states. The [23]86 fetch more bytes and therefore have fewer memory accesses/byte. bill davidsen (davidsen@crdos1.crd.GE.COM) {uunet | philabs}!crdgw1!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
bcase@cup.portal.com (Brian bcase Case) (06/09/89)
>$ When there is failure (Multics, VAX architecture, i8086) it causes > >Out of curiousity, and not trying to start a new religious war, what >about the VAX architecture do you consider a failure? I'm not the original poster, but .... The VAX is certainly not a commercial failure. However, its instruction encodings are an abomination because they force a serial instruction decode. If it takes 3 cycles just to decode an instruction, your only chance of achieving high performance is to speed up the clock. But if you can speed up the clock, then a machine with easier instruction decode will beat you.... Someday DEC will implement a parallel instruction decoder along the lines of the one used in the 486 (but much more complex). Then they will be able to lower the CPI of the VAX. However, everyone else will use the implementation resources for more productive features....
melvin@ji.Berkeley.EDU (Steve Melvin) (06/09/89)
In article <19255@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes: >Someday DEC will implement a parallel >instruction decoder along the lines of the one used in the 486 (but >much more complex). Then they will be able to lower the CPI of >the VAX. You're ignoring the most obvious solution: a decoded instruction cache. With this, the time to decode becomes an instruction cache miss effect and in the case of a hit, an entire instruction (or more) can be pulled out of the cache in a single cycle. Decoding should still be fast, but with a large enough cache, it's not as critical. (Now the problem becomes how to keep those darn REI's from flushing the cache, ... how about a new non-flushing REI instruction?) ---- Steve Melvin melvin@ji.Berkeley.EDU ...!ucbvax!melvin ----
slackey@bbn.com (Stan Lackey) (06/10/89)
In article <19255@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes: >>Out of curiousity, and not trying to start a new religious war, what >>about the VAX architecture do you consider a failure? > >I'm not the original poster, but .... The VAX is certainly not a >commercial failure. However, its instruction encodings are an >abomination because they force a serial instruction decode. If it >takes 3 cycles just to decode an instruction, your only chance of >achieving high performance is to speed up the clock. But if you You statement is correct for the 11/730 but incorrect for all the other VAXes. Even the 8200 chip set parses many combinations of opcodes and specifiers in parallel; it is true that it could be done better, just like anything that exists could have been done better. It most certainly does not take three cycles to parse the opcode; at most one cycle for the opcode, and one each per operand specifier. In many cases, opcode and first specifier, or even opcode and first two specifiers, and parsed in parallel. The CPI's you may have seen for the 11/780 were probably misleading, depending upon what the reader was "looking for". For example, three of the average cycles-per-instruction are due to address translation and cache misses. NTS, all computers feel this one. Note that the 'average' may not make much sense anyway; many instructions (the "RISC subset") really only take a few cycles. It's just that enough of the multi-cycle ones are executed, and when they happen, take a long time, and increase the average by a lot. Early implementations also suffered by not havine enough bandwidth on writes, and that mixed with instructions that have burst writes, added maybe one cycle to the average CPI. The "best" instruction set encodement in the world does not fix this. -Stan