lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (11/11/89)
From the press: "...Digital's engineers isolated the core VAX instruction set, including 80% of the most used opcodes and optimized it to the VAX 9000 gate structure. The conversion didn't involve reducing - or RISCing - the instruction set but more accurately hardwiring it into a single-cycle instruction set. ... The other 20% of complex instructions execute with microcode as always..." If DEC would document exactly what's in that 80%, then VAX compiler writers could FINALLY settle the subject of choosing between different instruction sequences. -- Don D.C.Lindsay Carnegie Mellon Computer Science
hascall@atanasoff.cs.iastate.edu (John Hascall) (11/12/89)
In article <???> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: }From the press: }"...Digital's engineers isolated the core VAX instruction set, }including 80% of the most used opcodes and optimized it to the VAX }9000 gate structure. The conversion didn't involve reducing - or }RISCing - the instruction set but more accurately hardwiring it into }a single-cycle instruction set. ... The other 20% of complex }instructions execute with microcode as always..." }If DEC would document exactly what's in that 80%, then VAX compiler }writers could FINALLY settle the subject of choosing between }different instruction sequences. How hard can it be to figure out? Write a simple loop: MOVL #BIGNUM,R0 10$: <i-u-t> ; repeat instruction under test, . ; say, 100 times : <i-u-t> SOBGTR R0,10$ Anyway, looking at the VAX Arch. Handbook (Chap 10) we find: 304 instructions (unless they've added some) 304 * 0.80 = 243 (approx.) We can probably assume that most of the "Kernel Instruction Set" (those instruction required in any implementation--may not be emulated) are hardcoded. That's 175 instructions instructions (less MOVC3, MOVC5, LDPCTX, PROBER, PROBEW, REI, SVPCTX, INDEX, POPR, PUSHR, XFC, CALLS, CALLG, RET, 6 queue instructions and 7 bitfield instructions) giving 148. Then there are 102 FP instructions (less ACBx (4), POLYx (4) and EMODx (4)) giving 90. 148 Kernel + 90 FP = 238 instructions (or 78%). I would suspect I'm not off by more than a handfull of instructions in either direction. John Hascall
mash@mips.COM (John Mashey) (11/12/89)
In article <1925@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes: ... >}If DEC would document exactly what's in that 80%, then VAX compiler >}writers could FINALLY settle the subject of choosing between >}different instruction sequences. > How hard can it be to figure out? Write a simple loop: > > MOVL #BIGNUM,R0 > 10$: > <i-u-t> ; repeat instruction under test, > . ; say, 100 times > : > <i-u-t> > SOBGTR R0,10$ ....... 1) Note that one must be careful with such a thing, because many of the more aggressive machines have all kinds of stalls or other pipeline effects that will NOT be revealed by such a test. It may be sufficient to reveal whether or not it's single-cycle-issue inthe normal case, or it might not. 2) It is hard for compiler writers to EVER figure out the optimal code sequences, in any evolving family of computers. This is nothing new. About 20 years ago, I was torturing myself to write Really Good S/360 BAL code, having carefully studied the timings for 360/50, 360/67, 360/75, and then 370/1xxs. In any broad computer line, there is seldom code that is optimal for everything. Note that optimal code for 286, 386, 486 are all different, as was code for 68000, 68010, and 68020, at least, and the 2-cycle bus interface of the 68030 changed some of the tradeoffs. 3) It certainly isthe case, that a plausible strategy in a product line is to worry about the machines with: the longest pipelines, and usually longest latencies the most parallel units in that you might put in optimizations that make little difference to the simpler machines, they won't usually hurt them much, if at all, while they noticably help the more complex ones. Along this line, I've heard of compiler speedups on S/360 machines (like code scheduling, to spread loads and usage of the loaded data apart), which dn't bother the old simpel machines, but help ones with more aggressive pipelines, because some of the stalls are then eliminated. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
henry@utzoo.uucp (Henry Spencer) (11/12/89)
In article <6927@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >If DEC would document exactly what's in that 80%, then VAX compiler >writers could FINALLY settle the subject of choosing between >different instruction sequences. Odds are good that you wouldn't go far wrong if you treated the VAX as a RISC: use the simple instructions and addressing modes and ignore the messy ones. Actually, I'm told that many CISCs perform better with code generated that way. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (11/12/89)
In article <1925@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes: > Anyway, looking at the VAX Arch. Handbook (Chap 10) we find: > > 304 instructions (unless they've added some) > 304 * 0.80 = 243 (approx.) I would assume that the 80% is dynamic instruction frequency, not static percentage of opcode space. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bzs@world.std.com (Barry Shein) (11/13/89)
In article <6927@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >If DEC would document exactly what's in that 80%, then VAX compiler >writers could FINALLY settle the subject of choosing between >different instruction sequences. If you want a good feel for the 80% look at the code generated by VAX/VMS Fortran. It uses far less than 80% of the instructions as far as I can tell (perhaps more in rare cases, hmmm, what %age of the instructions for what %age of generated code...) Someone had given me some Fortran code a while back and asked me to compare it under VMS/Fortran and Unix. The Unix Fortran we had was so abysmal that we finally agreed that rewriting it into C was fair game (since the interest was in precisely this algorithm being run, and other smallish algorithms which could be rewritten if that was the best way to do it.) Unix/C (4.2bsd) and VMS/Fortran (both Vax/780's) were compared. There was a few percent difference between the two, UNIX/C being slower (but not by much, still curious.) I finally compared generated code to see why. My conclusion was that the only difference is VMS/Fortran's avoidance of the SOBGTR instruction on simple loops, preferring instead a sequence like DEC, TST, BNE which apparently ran faster. In fact, I concluded that the only thing worth optimizing in this rather turgid code was the loop overhead, it had a lot of fancy if-then-else's but actually didn't do much other than swap elements around in a matrix (it was useful, but the looping overhead dominated the 20 minutes it took to run on these systems, you could remove the code in the loops and it made very little difference to the total run time, I wonder how common that is in physics code?) ANYHOW...sorry...it was an interesting exercise, go take a look at generated VMS/Fortran code (it's very good) and you'll see immediately the kind of things which are fast on a Vax. -- -Barry Shein Software Tool & Die, Purveyors to the Trade | bzs@world.std.com 1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs
bzs@world.std.com (Barry Shein) (11/13/89)
>If you want a good feel for the 80% look at the code generated by >VAX/VMS Fortran. It uses far less than 80% of the instructions as far >as I can tell (perhaps more in rare cases, hmmm, what %age of the >instructions for what %age of generated code...) Gak, did I write that? Replace "a good feel for the 80%" with "a good feel for what is probably in the 80%". I have no specific information, just assuming that their Fortran code generator knows the fastest parts of the instruction set currently so I'd guess that's what people are looking for. I shouldn't have started that at all, apologies. -- -Barry Shein Software Tool & Die, Purveyors to the Trade | bzs@world.std.com 1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs
dricejb@drilex.UUCP (Craig Jackson drilex1) (11/15/89)
In article <1989Nov12.183132.3120@world.std.com> bzs@world.std.com (Barry Shein) writes: > >In article <6927@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >>If DEC would document exactly what's in that 80%, then VAX compiler >>writers could FINALLY settle the subject of choosing between >>different instruction sequences. As an aside, I suspect that all of compiler writers that DEC cares about (those that work for DEC), already have access to all the instruction timing information that they want ... >If you want a good feel for the 80% look at the code generated by >VAX/VMS Fortran. It uses far less than 80% of the instructions as far >as I can tell (perhaps more in rare cases, hmmm, what %age of the >instructions for what %age of generated code...) >ANYHOW...sorry...it was an interesting exercise, go take a look at >generated VMS/Fortran code (it's very good) and you'll see immediately >the kind of things which are fast on a Vax. > -Barry Shein > >Software Tool & Die, Purveyors to the Trade | bzs@world.std.com From what I saw of the announcement, the 9000's are targeted at 'business', not 'scientific', applications. If you really want to know that 80% set, I'd look at the output of VAX/VMS COBOL, plus any other languages they have now for transaction processing. I went to a user's group meeting last week where a person was astounded that I didn't use COBOL. (This was for users of Unisys (formerly Burroughs) computers.) The average application on these machines does transaction processing, with 100s, if not 1000s of 'terminals' doing order-entry, ATM processing, or some such. There's a whole different world out there, which most Usenetters would have trouble even conceiving of. And that world is where the 9000s are targeted. -- Craig Jackson dricejb@drilex.dri.mgh.com {bbn,ll-xn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}
ktl@wag240.caltech.edu (Kian-Tat Lim) (11/16/89)
In article <6150@drilex.UUCP>, dricejb@drilex (Craig Jackson drilex1) writes: >From what I saw of the announcement, the 9000's are targeted at 'business', >not 'scientific', applications. And that's why they have vector processors :-). Digital's marketroids seem to have borrowed ideas from IBM's 3090 people: they're saying "it's a mainframe" AND "it's a supercomputer." I think I'll stick with my *KILLER MICROS*, thank you... -- Kian-Tat Lim (ktl@wagvax.caltech.edu, KTL @ CITCHEM.BITNET, GEnie: K.LIM1)
tihor@acf4.NYU.EDU (Stephen Tihor) (11/17/89)
They footnotede the its a supercomputer line at DECUS by stating that "Using IBM's Definitons..."
rod@venera.UUCP (Rodney Doyle Van Meter III) (11/17/89)
In article <6150@drilex.UUCP> dricejb@drilex.UUCP (Craig Jackson drilex1) writes: > >From what I saw of the announcement, the 9000's are targeted at 'business', >not 'scientific', applications. If you really want to know that 80% >set, I'd look at the output of VAX/VMS COBOL, plus any other languages >they have now for transaction processing. > >There's a whole different world out there, which most Usenetters would >have trouble even conceiving of. > >And that world is where the 9000s are targeted. Perhaps. Perhaps not. I'm sure that's where their high-reliability and transaction-processing marketing tacks are headed. However, they're implementing vector instructions as part of the VAX architecture. At some point, all CPUs without vector instructions will be required to emulate them. Fortunately, that's one area where DEC seems to do okay. Do business applications use vector instructions? Doubt it. They're pushing vector Fortran, anyway, just like everybody else, not vector COBOL. With four CPUs with vector processors, 512MB memory, and a few gig of reasonably fast disk, it is supposed to peak at around 1 Gflops. That's enough to keep a lot of supercomputer users happy, and it has the "advantages" of coming with VMS, which actually is a low-maintenance, relatively stable, multi-user OS, when compared to some supercomputer OSes. The kicker? The price tag, of course, just like always with a VAX. Order of five million, list, for the decked-out box, plus HSCs and disks. For that amount of money, you can get higher performance boxes. Has anybody seen the vector instructions? They are memory-to-memory, I assume? Does it include scatter-gather instructions? I think this is a good move for the VAX guys. It may give them some life for a while yet. --Rod