crowl@cs.rochester.edu (Lawrence Crowl) (06/15/88)
In article <28200161@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: >In article <491@daver.UUCP> daver@daver.UUCP (Dave Rand) writes: >>I am confused. How can a risc machine have a higher "vax mips" than native >>mips? MORE (not less) risc instructions are required to do the same task, >>when compared to a vax. > >Not always. Consider A=B+C, all in registers: > VAX: > mov rB,rA > add rC,rA > 3 address RISC: > add rA,rB,rC > >So, we have an existence proof. What characteristics of the machine actually >let this happen? This is incorrect. The VAX has three address arithmetic instructions. So the above example for a VAX (destinations are always on the right side) is: addl3 rB, rC, rA It also takes four bytes to encode this instruction, the same as most RISC machines. The VAX instruction set wins (on number of instructions executed) when using complex data structures because of the extensive addressing modes. For example, the loop to add two vectors into a third on the VAX is: top: addl3 (rA)+, (rB)+, (rC)+ sobgeq rD, top which takes seven bytes for two instructions. Most RISCs I now would have something like the following loop (again destinations on the right): top: load rA, rM load rB, rN add rM, rN, rO store rO, rC add rA, 1, rA add rB, 1, rB add rC, 1, rC add rD, -1, rD bgeq top which takes something on the order of thirty-six bytes and nine instructions. I cannot think of any general computing task (such as the loop above) in which the VAX will not execute fewer instructions. Anyone? -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627
chris@mimsy.UUCP (Chris Torek) (06/16/88)
In article <10595@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: >For example, the loop to add two vectors into a third on the VAX is: > > top: addl3 (rA)+, (rB)+, (rC)+ > sobgeq rD, top > >which takes seven bytes for two instructions. True. An optimising compiler might expand the loop, however: extzv $0,$3,rD,r0 bicl2 r0,rD # or bicl2 $7; same length casel r0,$0,$7 # start the right distance in 9: .word 0f - 9b # 0 .word 1f - 9b # 1 ... .word 7f - 9b # 7 7: addl3 (rA)+,(rB)+,(rC)+ 6: addl3 (rA)+,(rB)+,(rC)+ 5: addl3 (rA)+,(rB)+,(rC)+ 4: addl3 (rA)+,(rB)+,(rC)+ 3: addl3 (rA)+,(rB)+,(rC)+ 2: addl3 (rA)+,(rB)+,(rC)+ 1: addl3 (rA)+,(rB)+,(rC)+ 0: addl3 (rA)+,(rB)+,(rC)+ acbl $0,$-8,rD,7b # while (rD-=8) >= 0 This pushes the size up to (I think) 70 bytes. Too bad the RISC machines are still faster anyway :-) . Actually, you could get rid of the case and the branch table: extzv $0,$3,rD,r0 bicl2 r0,rD subl3 r0,$7,r0 # invert ashl $2,r0,r0 # times 4, size of addl3 instr below jmp (pc)[r0] # into the breach (or is it breech?...kapow! 0: addl3 (rA)+,(rB)+,(rC)+ # maybe an ancient muzzle loader :-) ) addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ acbl $0,$-8,rD,0b This drops off 9 bytes, down to 61 bytes. You can get rid of 5 more bytes by changing the acbl into subl2 $8,rD bgeq 0b but on non-pipelined VAXen that might be slower. Alternatively, if you have another register free, `mnegl $8,r1'; then acbl with r1 instead of $-8; this saves only 1 byte overall, but brings the acbl down to 6 bytes. [nb. the sobgeq loop above runs rD+1 times, so I made the acbl loops do the same. rD is left in a different state (-8 vs -1), and I did need r0 for entry calculation.] All of this just goes to show that the VAX provides too many ways to do things! -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
mcglk@scott.stat.washington.edu (Ken McGlothlen) (06/16/88)
In article <11981@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: +---------- | In article <10595@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: | +---------- | | For example, the loop to add two vectors into a third on the VAX is: | | | | top: addl3 (rA)+, (rB)+, (rC)+ | | sobgeq rD, top | | | | which takes seven bytes for two instructions. | +---------- | True. An optimising compiler might expand the loop, however: | | [... case expansion example ...] | | This pushes the size up to (I think) 70 bytes. Too bad the RISC | machines are still faster anyway :-) . | | [... more examples ...] | | All of this just goes to show that the VAX provides too many ways to | do things! | -- | In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) | Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris +---------- Oh, please. Guess we're gonna have to not only bash that ultra-complex VAX architecture, but while we're at it, may as well bash C, too. I mean, we've got so many ways of adding five to an integer variable! i = i + 5; i += 5; i++; i++; i++; i++; i++; for( j = 0 ; j < 5 ; j++ ) i++; Yup. Definitely too complex. I think we oughta just keep the "++" operator. I still haven't seen any good arguments as to why RISC is so much better or faster. I'm kind of fond of the VAX instruction set, and you can do a heck of a lot more with one line of its instruction set than you can with five or ten lines of RISC code. Is having eighty or so registers all that much faster? --Ken McGlothlen mcglk@scott.biostat.washington.edu mcglk@max.acs.washington.edu
brooks@maddog.llnl.gov (Eugene D. Brooks III) (06/17/88)
In article <914@entropy.ms.washington.edu> mcglk@scott.biostat.washington.edu writes: >I still haven't seen any good arguments as to why RISC is so much better or >faster. I'm kind of fond of the VAX instruction set, and you can do a heck >of a lot more with one line of its instruction set than you can with five >or ten lines of RISC code. Is having eighty or so registers all that much >faster? If main memory, and in particular shared memory in a multiprocessor, is 20 to 40 clocks away having eighty or so registers with fully pipelined memory access is really a whole lot faster.
steve@basser.oz (Stephen Russell) (06/20/88)
>In article <914@entropy.ms.washington.edu> mcglk@scott.biostat.washington.edu writes: >I still haven't seen any good arguments as to why RISC is so much better or >faster. Two quick reasons: 1. CISC costs silicon. Using up silicon in implementing complex instructions leaves less for performance enhancements, like large register sets, on-board caching, hardware assistance (barrel shifters, hardware multipliers, etc), translation lookaside buffers, etc. 2. CISC costs performance. Lots of addressing modes, for example, result in many pipeline blocks while additional addressing values are fetched from cache/memory. The whole CPU stops because of some extra indirection. Also, this stop/start behaviour must be planned for, and this costs more silicon - see 1 above. Keeping things simple allows the system to do something useful (or lots of useful things in parallel in the pipeline) for _every_ cycle. >I'm kind of fond of the VAX instruction set, and you can do a heck >of a lot more with one line of its instruction set than you can with five >or ten lines of RISC code. But is the single VAX instruction actually faster, all else being equal?
chris@softway.oz (Chris Maltby) (06/21/88)
This debate is just a waste of time. I don't care what the assembler code looks like or how many instructions there are etc etc. It seems that we are going to see a lot of RISC machines from now on, not because they are a clean or nice way to do things, but because they are easier to design than complex instruction set machines. This means that new technology hits the market first with a RISC architecture and it goes faster (for that reason alone) than the CISC machines released along side it which use older technology. In any case, all these dodgy machine architectures have put compiler writers back in business after the 68000 era ... -- Chris Maltby - Softway Pty Ltd (chris@softway.oz) PHONE: +61-2-698-2322 UUCP: uunet!softway.oz!chris FAX: +61-2-699-9174 INTERNET: chris@softway.oz.au
darin@nova.laic.uucp (Darin Johnson) (06/22/88)
In article <1277@basser.oz>, steve@basser.oz (Stephen Russell) writes: > >In article <914@entropy.ms.washington.edu> mcglk@scott.biostat.washington.edu writes: > >I'm kind of fond of the VAX instruction set, and you can do a heck > >of a lot more with one line of its instruction set than you can with five > >or ten lines of RISC code. > > But is the single VAX instruction actually faster, all else being equal? Perhaps it would be possible for someone to come up with an 'assembler-compiler' that would accept a CISC instruction set and generate RISC code. This would allow one to write using something like 'ADD mem-loc1 to mem-loc2 and store in mem-loc3(R1)' without having write the 5 or 10 RISC lines of code. The biggest drawback I can see, is that there would have to be 'optimizing assemblers'. Of course, such an assembler would find it difficult to take advantage of some common RISC idioms, such as register windows. Just another naive thought from the mind of... Darin Johnson (...pyramid.arpa!leadsv!laic!darin) (...ucbvax!sun!sunncal!leadsv!laic!darin) "All aboard the DOOMED express!"
jesup@cbmvax.UUCP (Randell Jesup) (06/24/88)
In article <270@laic.UUCP> darin@nova.laic.uucp (Darin Johnson) writes: >Perhaps it would be possible for someone to come up with an 'assembler-compiler' >that would accept a CISC instruction set and generate RISC code. This would >allow one to write using something like 'ADD mem-loc1 to mem-loc2 and store >in mem-loc3(R1)' without having write the 5 or 10 RISC lines of code. People already do this. For example, on the Rpm40, there are several "meta-instructions" that you can use, that actually produce a series of actual machine instructions. Examples are MUL, CALL, DIV, FPLDD, etc. -- Randell Jesup, Commodore Engineering {uunet|rutgers|ihnp4|allegra}!cbmvax!jesup
jlg@beta.lanl.gov (Jim Giles) (06/24/88)
In article <270@laic.UUCP>, darin@nova.laic.uucp (Darin Johnson) writes: > [...] > Perhaps it would be possible for someone to come up with an 'assembler-compiler' > that would accept a CISC instruction set and generate RISC code. This would > allow one to write using something like 'ADD mem-loc1 to mem-loc2 and store > to mem-loc3(R1)' without having write the 5 or 10 RISC lines of code. > The biggest drawback I can see, is that there would have to be 'optimizing > assemblers'. Of course, such an assembler would find it difficult to > take advantage of some common RISC idioms, such as register windows. This is already possible with macros, opdefs, and micros that many assemblers have. Just define one of these for each CISC instruction mnemonic you wish to emulate. (OK. The syntax for defining these may be messy, but it could be made to work.) As you pointed out, there is a need for optimizing assemblers. This need already exists for macro assemblers since hand pipelining a code is not possible through macro calls. The Cray has needed such an optimizing assembler for years. J.Giles Los Alamos
mash@mips.COM (John Mashey) (06/26/88)
In article <270@laic.UUCP> darin@nova.laic.uucp (Darin Johnson) writes: .... >The biggest drawback I can see, is that there would have to be 'optimizing >assemblers'. Of course, such an assembler would find it difficult to >take advantage of some common RISC idioms, such as register windows. Optimizing assemblers have existed for years, in various forms, and in many companies, on both CISC and RISC machines. Many RISC machines have optimizing assemblers, including such things as code scheduling, addressing-style optimization, optimization of constant creation, code selection for mulitply/divide by constants, etc, etc. We had the first version of the MIPSco one working BEFORE the R2000 architecture was even frozen, for example. At this point, optimization is moving further afield, i.e., one is even starting to see optimizing linkers [we do some of this, and I think Moto does also.] -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
smryan@garth.UUCP (Steven Ryan) (06/26/88)
>As you pointed out, there is a need for optimizing assemblers.
Strong disagreement--an assembler should be safe, simple, and dumb. If you
want an optimiser, use a compiler.
What is preferable is separate layer to do the optimisation. The problem with
an assembler is that it knows very little. Consider an assembler which replaces
a long branch with a different short branch. What if it occurs in a switch?
goto next + index
goto a
...
goto z
Also, to properly schedule requires moving code past loads and stores. How
can an assembler safely and generally determine what the target of memory
reference is?
chris@mimsy.UUCP (Chris Torek) (06/28/88)
>In article <11981@mimsy.UUCP> I wrote: >>All of this just goes to show that the VAX provides too many ways to >>do things! In article <914@entropy.ms.washington.edu> mcglk@scott.stat.washington.edu (Ken McGlothlen) writes: >Oh, please. > >Guess we're gonna have to not only bash that ultra-complex VAX architecture, >but while we're at it, may as well bash C, too. I mean, we've got so many >ways of adding five to an integer variable! I think you missed the implicit :-) ---I was half kidding. (But only about half.) >I still haven't seen any good arguments as to why RISC is so much better or >faster. Who cares about the arguments? The fact is that if you have somewhere between $10,000 and $1,000,000, and want to buy the fastest machine you can get for that, right now that machine is probably `RISC-based'. You can argue all you like as to why the Vax instruction set is better, or why the 88000 instruction set is better, but the fastest Vax CPU from DEC is slower than the fastest 88000 CPU from Motorola. If it were the other way around, DEC would be in fine shape. (Maybe they just need Motorola to design their next chip :-) .) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
colwell@mfci.uunet (06/28/88)
In article <12179@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>I still haven't seen any good arguments as to why RISC is so much better or >>faster. > >Who cares about the arguments? The fact is that if you have somewhere >between $10,000 and $1,000,000, and want to buy the fastest machine you >can get for that, right now that machine is probably `RISC-based'. > >You can argue all you like as to why the Vax instruction set is better, >or why the 88000 instruction set is better, but the fastest Vax CPU from >DEC is slower than the fastest 88000 CPU from Motorola. If it were the >other way around, DEC would be in fine shape. (Maybe they just need >Motorola to design their next chip :-) .) >-- But DEC IS in fine shape. They sell 'way more VAXen/year than everybody else combined. No judgment on the 88000 implied, but users don't really care about performance per se. They want solutions to their problems, which almost always requires decent I/O (large and fast), acceptable reliability and service, and lots of available software. Something they don't tell you in your computer architecture classes -- people don't always automatically buy the machine with the highest performance (nor should they). Also, please cast a jaundiced eye on the phrase "RISC-based". I think it has almost attained the status of "content-free". Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
mash@mips.COM (John Mashey) (06/30/88)
In article <12179@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: ... >Who cares about the arguments? The fact is that if you have somewhere >between $10,000 and $1,000,000, and want to buy the fastest machine you >can get for that, right now that machine is probably `RISC-based'. > >You can argue all you like as to why the Vax instruction set is better, >or why the 88000 instruction set is better, but the fastest Vax CPU from >DEC is slower than the fastest 88000 CPU from Motorola. --------^^^^^^ On what set of benchmarks is this assertion based? (I believe it's true for integer and single-precision FP, and if you'd said MIPS, it would have been true for DP floating also :-) So far, the only double-precision floating-point number we've seen for the 88K is Whetstone: 3 Megawhets (and I don't remember the source, sorry). (An 8700 is about 4Mwhets DP). We'd be VERY interested to see more DP benchmarks: from an examination of the 88K's architecture and the cycle counts, we have reasons to believe that the 88K design essentially SACRIFIED double precision floating performance for most compiled programs. We'd be glad to be disabused by knowledgable folks who can cite useful benchmarks like: DP Livermore Loops, Spice, Doduc, etc. Note one more time that VAX-relative performance numbers, computed the way DEC does, includes floating point..... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
lackey@Alliant.COM (Stan Lackey) (07/01/88)
>In article <12179@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >... >>Who cares about the arguments? The fact is that if you have somewhere >>between $10,000 and $1,000,000, and want to buy the fastest machine you >>can get for that, right now that machine is probably `RISC-based'. >> >>You can argue all you like as to why the Vax instruction set is better, >>or why the 88000 instruction set is better, but the fastest Vax CPU from >>DEC is slower than the fastest 88000 CPU from Motorola. This isn't much of an argument. The Alliant single CPU (released in 1985) also beats the VAX 8700 on whets, livermore loops, linpack, etc., and is anything but a RISC - 68020 instruction set, and floating point, vector, and concurrency instruction sets. Not to say that RISC is "bad" - I would rather have implemented a RISC than the 68020, and performance could very well have been better, design time would probably have been less, cost would have been less, etc. But then again we wouldn't have been able to offer pascal, ada, or c in the timeframe. There are much more than the classic RISC arguments to consider when making a business decision. -Stan
greg@vertical.oz (Greg Bond) (07/05/88)
In article <810@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes: >>As you pointed out, there is a need for optimizing assemblers. >Strong disagreement--an assembler should be safe, simple, and dumb. If you >want an optimiser, use a compiler. >What is preferable is separate layer to do the optimisation. The problem with In fact, use the "optimisation" pass from your favourite C compiler. This is tough to organise on most Unix boxes, but goes great on the 8086 X-compiler we have here. The optimiser is a separate assembler-assembler processor. I can't see where its orientation as a C optimiser would kill semantics of assembler code. But then, it may not be a real clever optimiser either (for the general case). -- Gregory Bond, Vertical Software, Melbourne (greg@vertical.oz) I used to be a pessimist. Now I am a realist.
ge@hobbit.sci.kun.nl (Ge' Weijers) (07/07/88)
In article <140@vertical.oz>, greg@vertical.oz (Greg Bond) writes: ) In article <810@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes: ) >>As you pointed out, there is a need for optimizing assemblers. ) >Strong disagreement--an assembler should be safe, simple, and dumb. If you ) >want an optimiser, use a compiler. ) >What is preferable is separate layer to do the optimisation. The problem with ) ) In fact, use the "optimisation" pass from your favourite C compiler. ) This is tough to organise on most Unix boxes, but goes great on the ) 8086 X-compiler we have here. The optimiser is a separate assembler-assembler ) processor. I can't see where its orientation as a C optimiser would kill ) semantics of assembler code. But then, it may not be a real clever ) optimiser either (for the general case). Watch out for C optimisers. They usually assume things about their input (register usage, Rx = frame pointer, etc) that are just NOT true for hand-written assembly language. A case in point: see the manual of the assembler for the Sun-4. It has an optimiser, but you are strongly advised not to use it. Ge' Weijers, mcvax!kunivv1!hobbit!ge -- Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands UUCP: {uunet!,}mcvax!kunivv1!hobbit!ge